HASOC

Datasets and Baseline Model

Overview Call for Participation Registration Important dates Organizers Datasets and Baseline Model Proceedings HASOC 2022 HASOC 2021 HASOC 2020 HASOC 2019

Baseline Model

We understand that FIRE hosts so many beginner friendly workshops every year and this problem might not seem like beginner friendly. So, we’ve decided to provide participants with a baseline model which will provide participants with a template for steps like importing data, preprocessing, featuring and classification. And the participants can make changes in the code and experiment with various settings. The code for baseline model, click here

Note: Note: baseline model is just to give you a basic idea of our dir. structure and how one can classify context based data, there are no restrictions on any kind of experiments also please note this baseline model is only for task on Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2)

Dataset

Datasets for all task 1,2,3 and 4 will be available on their respected sites.

Task 1 Task 2 Task 3 Task 4

HASOC 2022 Dataset

Task

Training Data

Test Data

Identification of Conversational Hate-Speech in Code-Mixed Languages(ICHCL) (Task-1 and Task 2)

Download

Offensive Language Identification in Marathi (Task-3A, 3B, 3C)

Download

HASOC 2021 Dataset