Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
Call for Participation
Task 1: Identifying Hate, offensive and profane content
Task 1A: Identifying Hate, offensive and profane content in Sinhala
This task focuses on Hate speech and Offensive language identification for Sinhala. Sinhala is a low-resource Indo-Aryan language spoken by over 17 million people in Sri Lanka and one of the two official languages in Sri Lanka. HASOC 2023 brings the first-ever shared task for Sinhala natural language processing.
This is a coarse-grained binary classification in which participating systems are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).
(NOT) Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content. (HOF) Hate and Offensive - This post contains Hate, offensive, and profane content.
The train/ test sets are based on the recently released SOLD: Sinhala Offensive Language Detection dataset [1].
[1] Ranasinghe, T., Anuradha, I., Premasiri, D., Silva, K., Hettiarachchi, H., Uyangodage, L. and Zampieri, M., 2022. SOLD: Sinhala Offensive Language Dataset. arXiv preprint arXiv:2212.00851.
Task 1B: Identifying Hate, offensive and profane content in Gujarati
This task focuses on Hate speech and Offensive language identification for Gujarati. Gujarati is a low-resource Indo-Aryan language with around 50M native speakers which is one among 22 official languages in India.
This is a coarse-grained binary classification in a few shot setting, in which participating systems are required to classify tweets into two classes, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).
(NOT) Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content. (HOF) Hate and Offensive - This post contains Hate, offensive, and profane content.
The train set contains ~200 tweets, and participants can explore various techniques to improve the system in a few short settings.
Dataset
Note: Datasets are password protected, please register to get access to passwords, once register you'll receive passwords from noreply.hasoc@gmail.com, in case if you have not received passwords after registration please check your spam, please reachout to us via hasocfire@googlegroups.com in case if any queries
Results
Important Dates
Timeline:
All subtracks have an independent timeline for training/test data release and run submission which will be available on respective websites. Here are the common timelines:
15th July
Training data release
15th August
Test set release and run submissions start
23th August
Registration deadline
29th August
Deadline for run submissions
31st August
Results announcement
20th September
Paper submission deadline
5th October
Review distribution
15th October
Revised paper submission
NOTE: All dates are in AoE timezone
Organisers
Student Coordinator
Acknowledgement
We would like to thank the AI Journal - Funding Opportunities for Promoting AI Research for supporting HASOC Task 1
Contact us
Subscribe to our mailing list for the latest announcements and discussions.
For any queries write to us at hasoc@googlegroups.com