Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
Datasets and Baseline Model
Baseline Model
We understand that FIRE hosts so many beginner friendly workshops every year and this problem might not seem like beginner friendly. So, we’ve decided to provide participants with a baseline model which will provide participants with a template for steps like importing data, preprocessing, featuring and classification. And the participants can make changes in the code and experiment with various settings. The code for baseline model, click here
Note: Note: baseline model is just to give you a basic idea of our dir. structure and how one can classify context based data, there are no restrictions on any kind of experiments also please note this baseline model is only for task on Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2)
Dataset
HASOC 2022 Dataset
HASOC 2021 Dataset
Subtask 1 Dataset
Category
Train Dataset
Test Dataset
Subtask 2 Dataset
To know more about the subtasks, click here.
HASOC 2020 Dataset
To know more click here.
HASOC 2019 Dataset
To know more click here.
Contact us
Subscribe to our mailing list for the latest announcements and discussions.
For any queries write to us at hasoc@googlegroups.com