HASOC (2021)

Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Task Description:
 
HASOC provides a forum and a data challenge for multilingual research on the identification of problematic content. This year, we offer 2 subtasks and a separate dataset for both the subatsks. Subtask 2 is a brand new problem offered this year. Both datasets are sampled from Twitter. Subtask-1 offers in English, Hindi with 2 problems, and Marathi with 1 problem. The subtasks-2 dataset contains English, Hindi and code-mixed Hindi tweets. Participants in this year’s shared task can choose to participate in one or two of the subtasks.Participants can look at the openly available data for HASOC 2019 & 2020. The data is accessible at  https://hasocfire.github.io/hasoc/2021/dataset.html
 
Subtask 1A: Identifying Hate, offensive and profane content from the post.

Sub-task A focus on Hate speech and Offensive language identification offered for English and Hindi. Sub-task A is coarse-grained binary classification in which participating system are required to classify tweets into two class, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).
  • (NOT) Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content.
  • (HOF) Hate and Offensive - This post contains Hate, offensive, and profane content.
Subtask 1B :- Discrimination between Hate, profane and offensive posts
This sub-task  is a fine-grained classification offered for English and Hindi. Hate-speech and offensive posts from the sub-task A are further classified into three categories.
  • (HATE) Hate speech :- Posts under this class contain Hate speech content.
  • (OFFN) Offenive :- Posts under this class contain offensive content.
  • (PRFN) Profane :- These posts contain profane words.
 
Subtask 2: Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL)
A conversational thread can also contain hate and offensive content it which is not apparent just from the single comment or the reply to comment but can be identified if given the context of the parent content.
HASOC
The parent tweet is expressing hate and profanity towards Muslim countries regarding the controversy happening in Israel at the time. The 2 comments on the tweet have written “Amine” which means "truthfully" in Persian. Which is supporting the hate but with the context of the parent. This sub-task focused on the binary classification of such conversational tweets with tree-structured data into:
  • (NOT) Non Hate-Offensive - This tweet, comment, or reply does not contain any Hate speech, profane, offensive content.
  • (HOF) Hate and Offensive - This tweet, comment, or reply contains Hate, offensive, and profane content in itself or is supporting hate expressed in the parent.
For more info, click here