HASOC provides a forum and a data challenge for multilingual research on the identification of problematic content. This year, we offer again English, German and Hindi with 2 sub-tasks, alltogether over 10.000 annotated tweets from Twitter. Participants in this year’s shared task can choose to participate in one or two of the subtasks.Participants can look at the openly available data of HASOC 2019: https://hasocfire.github.io/hasoc/2019/dataset.html
New in HASOC 2020:
- Multilingual track joining English, German and Hindi (promoting research on multilingual techniques)
- New sampling which worked independently of a seed word set (details will follow on the web site and in the overview paper)
- A sub-track is offered for Malayalam and Tamil.
Questions of participants can be discussed in a forum.
Sub-task A: Identifying Hate, offensive and profane content
Sub-task A focus on Hate speech and Offensive language identification offered for English, German, Hindi. Sub-task A is coarse-grained binary classification in which participating system are required to classify tweets into two class, namely: Hate and Offensive (HOF) and Non- Hate and offensive (NOT).
- (NOT) Non Hate-Offensive - This post does not contain any Hate speech, profane, offensive content.
- (HOF) Hate and Offensive - This post contains Hate, offensive, and profane content.
Sub-task B :- Discrimination between Hate, profane and offensive posts
This sub-task is a fine-grained classification offered for English, German, Hindi.. Hate-speech and offensive posts from the sub-task A are further classified into three categories.
- (HATE) Hate speech :- Posts under this class contain Hate speech content.
- (OFFN) Offenive :- Posts under this class contain offensive content.
- (PRFN) Profane :- These posts contain profane words.
Sub-track: Dravidian-CodeMix - Sentiment Analysis for Dravidian Languages in Code-Mixed Text