HASOC (2023)

Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Datasets and Baseline Model

Baseline Model

We understand that FIRE hosts so many beginner friendly workshops every year and this problem might not seem like beginner friendly. So, we’ve decided to provide participants with a baseline model which will provide participants with a template for steps like importing data, preprocessing, featuring and classification. And the participants can make changes in the code and experiment with various settings. The code for baseline model, click here

Note: Note: baseline model is just to give you a basic idea of our dir. structure and how one can classify context based data, there are no restrictions on any kind of experiments also please note this baseline model is only for task on Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2)

Dataset

Datasets for all task 1,2,3 and 4 will be available on their respected sites.

HASOC 2022 Dataset

Task

Training Data

Test Data

Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2)

Download

Download

Offensive Language Identification in Marathi (Task-3A, 3B, 3C)

Download

Download

HASOC 2021 Dataset

Subtask 1 Dataset

Category

Train Dataset

Test Dataset

English Dataset

Download

Download

Hindi Dataset

Download

Download

Marathi Dataset

Download

Download

Subtask 2 Dataset

Category

Train Dataset

Test Dataset

English-Hindi Code-Mix Dataset

Download

Download

To know more about the subtasks, click here.

HASOC 2020 Dataset

Category

Link

English Dataset

Download

Hindi Dataset

Download

German Dataset

Download

To know more click here.

HASOC 2019 Dataset

Category

Link

English Dataset

Download

Hindi Dataset

Download

German Dataset

Download

To know more click here.

Contact us

Subscribe to our mailing list for the latest announcements and discussions.

For any queries write to us at hasoc@googlegroups.com