Context-aware and expert data resources for Brazilian Portuguese hate speech detection

被引:0
|
作者
Vargas, Francielle [1 ,2 ]
Carvalho, Isabelle [1 ]
Pardo, Thiago A. S. [1 ]
Benevenuto, Fabricio [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, Brazil
[2] Univ Fed Minas Gerais, Comp Sci Dept, Belo Horizonte, Brazil
关键词
hate speech; Brazilian Portuguese; low-resource languages; RELIABILITY; PRAGMATICS;
D O I
10.1017/nlp.2024.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides data resources for low-resource hate speech detection. Specifically, we introduce two different data resources: (i) the HateBR 2.0 corpus, which is composed of 7,000 comments extracted from Brazilian politicians' accounts on Instagram and manually annotated a binary class (offensive versus non-offensive) and hate speech targets. It consists of an updated version of the HateBR corpus, in which highly similar and one-word comments were replaced; and (ii) the multilingual offensive lexicon (MOL), which consists of 1,000 explicit and implicit terms and expressions annotated with context information. The lexicon also comprises native-speaker translations and its cultural adaptations in English, Spanish, French, German, and Turkish. Both corpus and lexicon were annotated by three different experts and achieved high inter-annotator agreement. Lastly, we implemented baseline experiments on the proposed data resources. Results demonstrate the reliability of data outperforming baseline dataset results in Portuguese, besides presenting promising results for hate speech detection in different languages.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Context-Aware Deep Learning Model for Detection of Roman Urdu Hate Speech on Social Media Platform
    Bilal, Muhammad
    Khan, Atif
    Jan, Salman
    Musa, Shahrulniza
    IEEE ACCESS, 2022, 10 : 121133 - 121151
  • [2] Context-aware Adaptive Outlier Detection in Trajectory Data
    Danda, Srinivas
    Zhang, Ji
    Tao, Xiaohui
    Chun-Wei, Jerry
    Zhang, Wenbin
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5655 - 5657
  • [3] Context-Aware XGBoost for Glottal Closure Instant Detection in Speech Signal
    Matousek, Jindrich
    Vrastil, Michal
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 446 - 455
  • [4] Context-Aware Drone Detection
    Oligeri, Gabriele
    Sciancalepore, Savio
    CPSS'22: PROCEEDINGS OF THE 8TH ACM CYBER-PHYSICAL SYSTEM SECURITY WORKSHOP, 2022, : 63 - 71
  • [5] Context-Aware Saliency Detection
    Goferman, Stas
    Zelnik-Manor, Lihi
    Tal, Ayellet
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2376 - 2383
  • [6] Context-Aware Saliency Detection
    Goferman, Stas
    Zelnik-Manor, Lihi
    Tal, Ayellet
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (10) : 1915 - 1926
  • [7] Context-Aware Drift Detection
    Cobb, Oliver
    Van Looveren, Arnaud
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [8] VISUAL FEATURES FOR CONTEXT-AWARE SPEECH RECOGNITION
    Gupta, Abhinav
    Miao, Yajie
    Neves, Leonardo
    Metze, Florian
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5020 - 5024
  • [9] Controllable Context-aware Conversational Speech Synthesis
    Cong, Jian
    Yang, Shan
    Hu, Na
    Li, Guangzhi
    Xie, Lei
    Su, Dan
    INTERSPEECH 2021, 2021, : 4658 - 4662
  • [10] CONTEXT-AWARE TRANSFORMER TRANSDUCER FOR SPEECH RECOGNITION
    Chang, Feng-Ju
    Liu, Jing
    Radfar, Martin
    Mouchtaris, Athanasios
    Omologo, Maurizio
    Rastrow, Ariya
    Kunzmann, Siegfried
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 503 - 510