HASOC (2022)

Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

A conversational thread can also contain hate, offensive, and profane content, which is not apparent from a standalone or single tweet or comment or the reply to a comment, but can be identified if given the context of the parent content.

example-1

Example-1

The above screenshot from Twitter describes the problem at hand effectively. The parent/source tweet, which was posted at 2:30 am on May 11th, expresses hate and profanity towards Muslim countries regarding the controversy happening during the recent Israel-Palestine conflict. The 2 comments on the tweet have written "Amine", which means trustworthy or honest in Arabic. If the 2 comments were to be analyzed for hate or offensive speech without the context of the parent tweet, they wouldn’t be classified as hate or offensive content. But if we take the context of the conversation, then we can say that the comments support the hate/profanity expressed in the parent tweet. So those comments are labelled as hate/offensive/profane.

This sub-task focused on the binary classification (task 2) of such conversational tweets with tree-structured data into:
  • (NOT) Non Hate-Offensive - This tweet, comment, or reply does not contain any Hate speech, profane, offensive content.
  • (HOF) Hate and Offensive - This tweet, comment, or reply contains Hate, offensive, and profane content in itself or supports hate expressed in the parent tweet
Furthermore, this year for the Hinglish language, we’re introducing a multiclass task (task 3) that further divides the HOF tweets into 3 subclasses:
  • (SHOF) Standalone Hate - This tweet, comment, or reply contains Hate, offensive, and profane content in itself.
  • (CHOF) Contextual Hate - Comment or reply is supporting the hate, offense and profanity expressed in its parent. This includes affirming the hate with positive sentiment (example-2). and having apparent hate (example-3).
  • (NONE) Non-Hate - This tweet, comment, or reply does not contains Hate, offensive, and profane content in itself.
example-2

Example-2

Another such example with code mixed text.
  • The Source Tweet: Modi Ji COVID situation ko solve karne ke liye ideas maang rahe the. Mera idea hai resignation dedo please…
  • Translation : Modi ji (PM of India) was asking for ideas to solve the covid situation of India. My idea to him is to resign.
  • The Comment: Doctors aur Scientists se manga hai. Chutiyo se nahi. Baith niche. [HOF/SHOF]
  • Translation: They have asked Doctors and Scientists. Not fuckers. Sit down. [HOF/SHOF]
  • The reply: You totally nailed it, can’t stop laughing. [HOF/CHOF]
The reply has a positive sentiment. But it is positive in favour of the hate expressed towards the author of the source tweet in the comment. Hence, it is supporting the hate expressed in the comment. Hence, it is also hate speech.

This is the type of problem we’re aiming to solve via this shared task.

example-3

Example-3

example-4

Example-4

In the above example-4, the main tweet portrays hate against a religion. Meanwhile, the comment is hateful against the author of the tweet. And not supporting the hate that was expressed in the main tweet. This is an example of 2 levels having standalone hate.

The sampling and annotation of social media conversation threads is very challenging. We have chosen controversial stories on diverse topics to minimize the effect of bias. We’ve hand picked controversial stories from the following topics that have a high probability of containing hate, offensive, and profane posts.

The controversial stories are as follow:

  • Celibrity Controversy.
  • Temple-Mosque Controversy.
  • Taliban.
  • Covid Controversy.
  • Cast.
  • Common Civil Code.
  • Hinduphobia.
  • Namaz on public place.
  • Bullibai.
  • Farmer Protest.
  • Cricket Controversy.
  • Historical Hindu Muslim.
  • Islamophobia.
  • CAA.
  • Russian-Ukrainian conflict.
  • Jew.
  • Kashmir.
  • Ozil.

The directory structure of data directory :

HASOC

The structure of data.json :

The rectangles are keys and ovals are elements of array represented by the parent key.

HASOC
The contents of various keys are as follow:
  • tweet: the text that is contained in the tweet
  • tweet_id: a global tweet_id generated by twitter
  • comments: array of comments that a tweet has
  • replies: array of replies that a comment has

The structure of labels.json, binary_labels.json and contextual_labels.json is linear. They contain no nested data structure. They only contains key-value pairs where the key is the tweet id and value is the label for the tweet with the given tweet id. binary_labels.json in hinglish and labels.json german will be used for task-1(binary classification) and contextual_labels.json will be used for task-2(multi class classification).



Subscribe to our mailing list for the latest announcements and discussions.

For any queries write to us at hasoc2019@googlegroups.com

  • hirenmadhu16@gmail.com - Hiren Madhu :- Indian Institute of Science, Bangalore, India

  • shreysatapara@gmail.com - Shrey Satapara :- Indian Institute of Technology, Hyderabad, India