A Hierarchically-Labeled Portuguese Hate Speech Dataset

被引:0
|
作者
Fortuna, Paula [1 ,3 ]
Rocha da Silva, Joao [1 ,2 ]
Soler-Company, Juan [3 ]
Wanner, Leo [3 ,4 ]
Nunes, Sergio [1 ,2 ]
机构
[1] Univ Porto, INESC TEC, Rua Dr Roberto Frias S-N, P-4200465 Porto, Portugal
[2] Univ Porto, FEUP, Rua Dr Roberto Frias S-N, P-4200465 Porto, Portugal
[3] Pompeu Fabra Univ, ETIC, NLP Grp, Barcelona, Spain
[4] Catalan Inst Res & Adv Studies ICREA, Barcelona, Spain
基金
欧盟地平线“2020”;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. First, non-experts annotated the tweets with binary labels ('hate' vs. 'no-hate'). Then, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. The hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.
引用
收藏
页码:94 / 104
页数:11
相关论文
共 50 条
  • [21] Hate Speech
    Wagner, A. Jay
    JOURNALISM & MASS COMMUNICATION QUARTERLY, 2023, 100 (02)
  • [22] The hate speech
    Aguilar Pirachican, Manuel Rodrigo
    DESDE EL JARDIN DE FREUD-REVISTA DE PSICOANALISIS, 2019, (19): : 328 - 333
  • [23] Hate Speech
    Gurstein, Rochelle
    SALMAGUNDI-A QUARTERLY OF THE HUMANITIES AND SOCIAL SCIENCES, 2018, (197): : 92 - 104
  • [24] YouTube based religious hate speech and extremism detection dataset with machine learning baselines
    Ashraf, Noman
    Rafiq, Abid
    Butt, Sabur
    Shehzad, Hafiz Muhammad Faisal
    Sidorov, Grigori
    Gelbukh, Alexander
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4769 - 4777
  • [25] Free Speech and Hate Speech
    Howard, Jeffrey W.
    ANNUAL REVIEW OF POLITICAL SCIENCE, VOL 22, 2019, 22 : 93 - 109
  • [26] Context-aware and expert data resources for Brazilian Portuguese hate speech detection
    Vargas, Francielle
    Carvalho, Isabelle
    Pardo, Thiago A. S.
    Benevenuto, Fabricio
    NATURAL LANGUAGE PROCESSING, 2024,
  • [27] HATE SPEECH AND FREE SPEECH
    NOORANI, AG
    ECONOMIC AND POLITICAL WEEKLY, 1992, 27 (46) : 2456 - 2456
  • [28] FREEDOM OF SPEECH AND HATE SPEECH
    Casarin Barroso Silva, Julio Cesar
    REVISTA DIREITO GV, 2015, 11 (01) : 37 - 63
  • [29] Hate begets Hate: A Temporal Study of Hate Speech
    Mathew B.
    Illendula A.
    Saha P.
    Sarkar S.
    Goyal P.
    Mukherjee A.
    Proceedings of the ACM on Human-Computer Interaction, 2020, 4 (CSCW2)
  • [30] Limiting the capacity for hate: Hate speech, hate groups and the philosophy of hate
    Peters, Michael A.
    EDUCATIONAL PHILOSOPHY AND THEORY, 2022, 54 (14) : 2325 - 2330