Towards Generalized Offensive Language Identification

被引:0
|
作者
Dmonte, Alphaeus [1 ]
Arya, Tejas [2 ]
Ranasinghe, Tharindu [3 ]
Zampieri, Marcos [1 ]
机构
[1] George Mason Univ, Fairfax, VA 22030 USA
[2] Rochester Inst Technol, Rochester, NY USA
[3] Univ Lancaster, Lancaster, England
关键词
Offensive Language; Large Language Models; Generalizability;
D O I
10.1007/978-3-031-78541-2_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed to automatically identify potentially harmful content and to mitigate its impact. These systems can follow two approaches; (i) Use publicly available models and application endpoints, including prompting large language models (LLMs) (ii) Annotate datasets and train ML models on them. However, both approaches lack an understanding of how generalizable they are. Furthermore, the applicability of these systems is often questioned in off-domain and practical environments. This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark: GenOffense. We answer three research questions on generalizability. Our findings will be useful in creating robust real-world offensive language detection systems.
引用
收藏
页码:271 / 286
页数:16
相关论文
共 50 条
  • [31] A Survey of Offensive Language Detection for the Arabic Language
    Husain, Fatemah
    Uzuner, Ozlem
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (01)
  • [32] On the Robustness of Offensive Language Classifiers
    Rusert, Jonathan
    Shafiq, Zubair
    Srinivasan, Padmini
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7424 - 7438
  • [33] SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification
    Rosenthal, Sara
    Atanasova, Pepa
    Karadzhov, Georgi
    Zampieri, Marcos
    Nakov, Preslav
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 915 - 928
  • [34] Towards a generalized robust analysis by fictitious identification
    Manceaux-Cumer, C
    ROBUST CONTROL DESIGN 2000, VOLS 1 & 2, 2000, 1-2 : 273 - 278
  • [35] On the offensive: Prejudice in language past and present
    Branson, Dominique
    LANGUAGE IN SOCIETY, 2021, 50 (05) : 792 - 793
  • [36] SOLD: Sinhala offensive language dataset
    Ranasinghe, Tharindu
    Anuradha, Isuri
    Premasiri, Damith
    Silva, Kanishka
    Hettiarachchi, Hansi
    Uyangodage, Lasitha
    Zampieri, Marcos
    LANGUAGE RESOURCES AND EVALUATION, 2025, 59 (01) : 297 - 337
  • [37] Arabic Offensive Language Classification on Twitter
    Mubarak, Hamdy
    Darwish, Kareem
    SOCIAL INFORMATICS, SOCINFO 2019, 2019, 11864 : 269 - 276
  • [38] Offensive Language Recognition in Social Media
    Shushkevich, Elena
    Cardiff, John
    Rosso, Paolo
    Akhtyamova, Liliya
    COMPUTACION Y SISTEMAS, 2020, 24 (02): : 523 - 532
  • [39] OFFENSIVE LANGUAGE AND IMPRESSIONS DURING AN INTERVIEW
    POWELL, L
    CALLAHAN, K
    COMANS, C
    MCDONALD, L
    MANSELL, J
    TROTTER, MD
    WILLIAMS, V
    PSYCHOLOGICAL REPORTS, 1984, 55 (02) : 617 - 618
  • [40] SOLD: Sinhala offensive language datasetSOLD: Sinhala offensive language datasetT. Ranasinghe et al.
    Tharindu Ranasinghe
    Isuri Anuradha
    Damith Premasiri
    Kanishka Silva
    Hansi Hettiarachchi
    Lasitha Uyangodage
    Marcos Zampieri
    Language Resources and Evaluation, 2025, 59 (1) : 297 - 337