Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages
We understand that FIRE hosts so many beginner friendly workshops every year and this
problem might not seem like beginner friendly. So, we’ve decided to provide participants
with a baseline model which will provide participants
with a template for steps like importing data, preprocessing, featuring and
classification. And the participants can make changes in the code and experiment with
various settings. The code for baseline model, click
here
Note: Note: baseline model is just to give you a basic idea of our dir.
structure and how one can classify context based data, there are no restrictions on
any kind of experiments also please note this baseline model is only for task on
Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1
and Task 2)
Category | Train Dataset | Test Dataset |
---|---|---|
English Dataset | Download | Download |
Hindi Dataset | Download | Download |
Marathi Dataset | Download | Download |
Category | Train Dataset | Test Dataset |
---|---|---|
English-Hindi Code-Mix Dataset | Download | Download |
To know more about the subtasks, click here.
Category | Link |
---|---|
English Dataset | Download |
Hindi Dataset | Download |
German Dataset | Download |
To know more click here.
Category | Link |
---|---|
English Dataset | Download |
Hindi Dataset | Download |
German Dataset | Download |
To know more click here.