HASOC

Task 1: Identifying Hate, offensive and profane content

Task 1A: Identifying Hate, offensive and profane content in Sinhala

This task focuses on Hate speech and Offensive language identification for Sinhala. Sinhala is a low-resource Indo-Aryan language spoken by over 17 million people in Sri Lanka and one of the two official languages in Sri Lanka. HASOC 2023 brings the first-ever shared task for Sinhala natural language processing.

This is a coarse-grained binary classification in which participating systems are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).

(NOT) Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content. (HOF) Hate and Offensive - This post contains Hate, offensive, and profane content.

The train/ test sets are based on the recently released SOLD: Sinhala Offensive Language Detection dataset [1].

[1] Ranasinghe, T., Anuradha, I., Premasiri, D., Silva, K., Hettiarachchi, H., Uyangodage, L. and Zampieri, M., 2022. SOLD: Sinhala Offensive Language Dataset. arXiv preprint arXiv:2212.00851.

Task 1B: Identifying Hate, offensive and profane content in Gujarati

This task focuses on Hate speech and Offensive language identification for Gujarati. Gujarati is a low-resource Indo-Aryan language with around 50M native speakers which is one among 22 official languages in India.

This is a coarse-grained binary classification in a few shot setting, in which participating systems are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).

(NOT) Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content. (HOF) Hate and Offensive - This post contains Hate, offensive, and profane content.

The train set contains ~200 tweets, and participants can explore various techniques to improve the system in a few short settings.

Dataset

Note: Datasets are password protected, please register to get access to passwords, once register you'll receive passwords from noreply.hasoc@gmail.com, in case if you have not received passwords after registration please check your spam, please reachout to us via hasocfire@googlegroups.com in case if any queries

Task

Training Data

Test Data

Task 1A: Identifying Hate, offensive and profane content in Sinhala

Download

Task 1B: Identifying Hate, offensive and profane content in Gujarati

Download

Results

Task 1A Sinhala

Task 1B Gujarati

Best Run Submitted Per Team Per Task

Download

All Runs Submitted Per Task

Download

Important Dates

Timeline:

All subtracks have an independent timeline for training/test data release and run submission which will be available on respective websites. Here are the common timelines:

15th July

Training data release

15th August

Test set release and run submissions start

23th August

Registration deadline

29th August

Deadline for run submissions

31st August

Results announcement

20th September

Paper submission deadline

5th October

Review distribution

15th October

Revised paper submission

NOTE: All dates are in AoE timezone

Organisers

Thomas Mandl :- University of Hildesheim, Germany
Sandip Modha :- DA-IICT & LDRP-ITR, Gandhinagar, India
Shrey Satapara :- IIT Hyderabad, Hyderabad , India
Hiren Madhu :- IISC Banglore, Banglore, India
Prasenjit majumder :- DA-IICT, Gandhinagar, India
Tharindu Ranasinghe :- University of Wolverhampton, UK
Marcos Zampieri :- Rochester Institute of Technology, USA
Alphaeus Eric Dmonte :- George Mason University, USA
Pavan Pandya :- LDRP-ITR, Gandhinagar, India

Student Coordinator

Nisarg Shah :- VSITR, Gandhinagar, India
Jagrat Patel :- LDRP-ITR, Gandhinagar, India
Jaivin Barot :- LDRP-ITR, Gandhinagar, India

Acknowledgement

We would like to thank the AI Journal - Funding Opportunities for Promoting AI Research for supporting HASOC Task 1

Contact us

Subscribe to our mailing list for the latest announcements and discussions.

For any queries write to us at hasoc@googlegroups.com