HASOC (2022)

Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

We understand that FIRE hosts so many beginner friendly workshops every year and this problem might not seem like beginner friendly. So, we’ve decided to provide participants with a baseline model which will provide participants with a template for steps like importing data, preprocessing, featuring and classification. And the participants can make changes in the code and experiment with various settings. The code for baseline model, click here

Note: Note: baseline model is just to give you a basic idea of our dir. structure and how one can classify context based data, there are no restrictions on any kind of experiments also please note this baseline model is only for task on Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2)

Task Training Data Test Data
Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2) Download Download
Offensive Language Identification in Marathi (Task-3A, 3B, 3C) Download Download

Subtask 1 Dataset


Category Train Dataset Test Dataset
English Dataset Download Download
Hindi Dataset Download Download
Marathi Dataset Download Download

Subtask 2 Dataset


Category Train Dataset Test Dataset
English-Hindi Code-Mix Dataset Download Download

To know more about the subtasks, click here.

Category Link
English Dataset Download
Hindi Dataset Download
German Dataset Download

To know more click here.

Category Link
English Dataset Download
Hindi Dataset Download
German Dataset Download

To know more click here.