HASOC (2023)

Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Call for Participation

Task 2: Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL)

A conversational thread can also contain hate, offensive, and profane content, which is not apparent from a standalone or single tweet or comment or the reply to a comment, but can be identified if given the context of the parent content.

The above screenshot from Twitter describes the problem at hand effectively. The parent/source tweet, which was posted at 2:30 am on May 11th, expresses hate and profanity towards Muslim countries regarding the controversy happening during the recent Israel-Palestine conflict. The 2 comments on the tweet have written "Amine", which means trustworthy or honest in Arabic. If the 2 comments were to be analyzed for hate or offensive speech without the context of the parent tweet, they wouldn’t be classified as hate or offensive content. But if we take the context of the conversation, then we can say that the comments support the hate/profanity expressed in the parent tweet. So those comments are labelled as hate/offensive/profane.

This sub-task focused on the binary classification (task 2) of such conversational tweets with tree-structured data into:

(NOT) Non Hate-Offensive - This tweet, comment, or reply does not contain any Hate speech, profane, offensive content.

(HOF) Hate and Offensive - This tweet, comment, or reply contains Hate, offensive, and profane content in itself or supports hate expressed in the parent tweet

Furthermore, this year for the Hinglish language, we’re introducing a multiclass task (task 2b) that further divides the HOF tweets into 3 subclasses:

(SHOF) Standalone Hate - This tweet, comment, or reply contains Hate, offensive, and profane content in itself.

(CHOF) Contextual Hate - Comment or reply is supporting the hate, offense and profanity expressed in its parent. This includes affirming the hate with positive sentiment (example-2). and having apparent hate (example-3).

(NONE) Non-Hate -This tweet, comment, or reply does not contains Hate, offensive, and profane content in itself.

Another such example with code mixed text.

The Source Tweet: Modi Ji COVID situation ko solve karne ke liye ideas maang rahe the. Mera idea hai resignation dedo please...

Translation: Modi ji (PM of India) was asking for ideas to solve the covid situation of India. My idea to him is to resign.

The Comment: Doctors aur Scientists se manga hai. Chutiyo se nahi. Baith niche. [HOF/SHOF]

Translation: They have asked Doctors and Scientists. Not fuckers. Sit down. [HOF/SHOF]

The reply: You totally nailed it, can’t stop laughing. [HOF/CHOF]

The reply has a positive sentiment. But it is positive in favour of the hate expressed towards the author of the source tweet in the comment. Hence, it is supporting the hate expressed in the comment. Hence, it is also hate speech.

This is the type of problem we’re aiming to solve via this shared task.

In the above (example-4), the main tweet portrays hate against a religion. Meanwhile, the comment is hateful against the author of the tweet. And not supporting the hate that was expressed in the main tweet. This is an example of 2 levels having standalone hate.

ICHCL Dataset & Baseline

We understand that FIRE hosts so many beginner friendly workshops every year and this problem might not seem like beginner friendly. So, we’ve decided to provide participants with a baseline model which will provide participants with a template for steps like importing data, preprocessing, featuring and classification. And the participants can make changes in the code and experiment with various settings.This baseline uses a pseudo labelling. approach.



Note: Datasets are password protected, please register to get access to passwords, once register you'll receive passwords from noreply.hasoc@gmail.com, in case if you have not received passwords after registration please check your spam, please reachout to us via hasocfire@googlegroups.com in case if any queries

Task

Training Data

Test Data

Task 2: Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL)

Download

Download

Results

Task 2A

Task 2B

Best Run Submitted Per Team Per Task

Download

Download

All Runs Submitted Per Task

Download

Download

Important Dates

Timeline:

All subtracks have an independent timeline for training/test data release and run submission which will be available on respective websites. Here are the common timelines:

15th July

Training data release

15th August

Test set release and run submissions start

23th August

Registration deadline

29th August

Deadline for run submissions

31st August

Results announcement

20th September

Paper submission deadline

5th October

Review distribution

15th October

Revised paper submission

NOTE: All dates are in AoE timezone

Organisers

  • Thomas Mandl :- University of Hildesheim, Germany
  • Sandip Modha :- DA-IICT & LDRP-ITR, Gandhinagar, India
  • Shrey Satapara :- IIT Hyderabad, Hyderabad , India
  • Hiren Madhu :- IISC Banglore, Banglore, India
  • Prasenjit majumder :- DA-IICT, Gandhinagar, India
  • Tharindu Ranasinghe :- University of Wolverhampton, UK
  • Marcos Zampieri :- Rochester Institute of Technology, USA
  • Alphaeus Eric Dmonte :- George Mason University, USA
  • Pavan Pandya :- LDRP-ITR, Gandhinagar, India

Student Coordinator

  • Nisarg Shah :- VSITR, Gandhinagar, India
  • Jagrat Patel :- LDRP-ITR, Gandhinagar, India
  • Jaivin Barot :- LDRP-ITR, Gandhinagar, India

Contact us

Subscribe to our mailing list for the latest announcements and discussions.

For any queries write to us at hasoc@googlegroups.com