HASOC (2023)

Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Call for Participation

Call for Participation

We are excited to announce the 5th edition of HASOC, consisting of various interesting shared tasks. We invite you to participate in the same.

This time we are having four tasks which are as follows:

Task 1 focus on identifying hate speech, offensive language, and profanity in different languages using natural language processing techniques.

• Task 1A deals with identifying hate and offensive content in Sinhala, a low-resource Indo-Aryan language spoken in Sri Lanka. The task involves classifying tweets into two categories: Hate and Offensive (HOF) or Non-Hate and Offensive (NOT). The dataset for this task is based on the Sinhala Offensive Language Detection dataset.

• Task 1B focuses on identifying hate and offensive content in Gujarati, another low-resource Indo-Aryan language spoken by approximately 50 million people in India. Similarly, participants need to classify tweets into HOF or NOT categories. The training set for this task consists of around 200 tweets.

For more details please visit Task 1

Task 2, known as the Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL), addresses the challenge of identifying hate speech and offensive content in code-mixed conversations on social media. Code-mixed text includes multiple languages within a single conversation. The task is divided into two subtasks.

• In Task 2a, participants need to perform binary classification on conversational tweets with tree-structured data. They have to determine whether a tweet, comment, or reply contains hate speech, offensive language, or profanity (HOF) or if it is non-hate and offensive (NOT). The classification should consider both the individual content and support for hate expressed in the parent tweet.

• Task 2b involves the classification of conversational tweets with tree-structured data into specific forms of hate. Participants must identify if the tweet, comment, or reply contains standalone hate (SHOF), contextual hate (CHOF) that supports hate expressed in the parent, or if it is non-hate (NONE).

For more details please visit Task 2

Task 3 aims to detect the various hateful spans within a sentence already considered hateful. A hate span is a set of continuous tokens that, in tandem, communicate the explicit hatefulness in a sentence.

• For instance, in the statement, "Women ... Can't live with them... Can't shoot them," the portion highlighted in bold will be considered a hateful span. This shared task aims to extract all such spans from a hateful text.

• The input texts are all in English. The detection of hateful spans is achieved by mapping this into a sequence labeling problem. For every token of the sequences, we have manually annotated the start and end of a hateful span. This is achieved by the BIO notation tagging, where B' represents the beginning of the hate span,' I' forms the continuation of a hate span, and' O' represents the non-hate tag. The task is then to learn the correct sequence of the BIO tags for a given sentence. For example, in the above sentence, the tag sequence for the preprocessed sentence will be of the form "women can't live with them can't shoot them" → "O O O O O B I I"; "I" notation cannot exist on its own and will always be preceded by either an "I" or "B". Consequently, a “B” notation can be immediately followed by an “O” in case the span is just a single word.

For more details please visit Task 3

Task 4 aims to detect hate speech in Bengali, Bodo, and Assamese languages. It is a binary classification task. Each dataset (for the three languages) consists of a list of sentences with their corresponding class (hate or offensive (HOF) or not hate (NOT)). Data is primarily collected from Twitter, Facebook, or youtube comments.

• The Macro F1 score will be the yardstick of the task. Team rank will be determined based on the Macro F1 score of the first part.

For more details please visit Task 4

We believe that your expertise and contribution will be invaluable in advancing the state-of-the-art in hate speech classification. We encourage you to participate in this exciting shared task and contribute to the research community.

Contact us

Subscribe to our mailing list for the latest announcements and discussions.

For any queries write to us at hasoc@googlegroups.com