Towards Generalized Offensive Language Identification

被引:0
|
作者
Dmonte, Alphaeus [1 ]
Arya, Tejas [2 ]
Ranasinghe, Tharindu [3 ]
Zampieri, Marcos [1 ]
机构
[1] George Mason Univ, Fairfax, VA 22030 USA
[2] Rochester Inst Technol, Rochester, NY USA
[3] Univ Lancaster, Lancaster, England
关键词
Offensive Language; Large Language Models; Generalizability;
D O I
10.1007/978-3-031-78541-2_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed to automatically identify potentially harmful content and to mitigate its impact. These systems can follow two approaches; (i) Use publicly available models and application endpoints, including prompting large language models (LLMs) (ii) Annotate datasets and train ML models on them. However, both approaches lack an understanding of how generalizable they are. Furthermore, the applicability of these systems is often questioned in off-domain and practical environments. This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark: GenOffense. We answer three research questions on generalizability. Our findings will be useful in creating robust real-world offensive language detection systems.
引用
收藏
页码:271 / 286
页数:16
相关论文
共 50 条
  • [1] Offensive Language Identification in Greek
    Pitenis, Zeses
    Zampieri, Marcos
    Ranasinghe, Tharindu
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5113 - 5119
  • [2] Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts
    Charangan Vasantharajan
    Uthayasanker Thayasivam
    SN Computer Science, 2022, 3 (1)
  • [3] Target-Based Offensive Language Identification
    Zampieri, Marcos
    Morgan, Skye
    North, Kai
    Ranasinghe, Tharindu
    Simmons, Austin
    Khandelwal, Paridhi
    Rosenthal, Sara
    Nakov, Preslav
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 762 - 770
  • [4] OffensEval 2023: Offensive language identification in the age of Large Language Models
    Zampieri, Marcos
    Rosenthal, Sara
    Nakov, Preslav
    Dmonte, Alphaeus
    Ranasinghe, Tharindu
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (06) : 1416 - 1435
  • [5] Offensive language identification with multi-task learning
    Marcos Zampieri
    Tharindu Ranasinghe
    Diptanu Sarkar
    Alex Ororbia
    Journal of Intelligent Information Systems, 2023, 60 : 613 - 630
  • [6] Offensive language identification with multi-task learning
    Zampieri, Marcos
    Ranasinghe, Tharindu
    Sarkar, Diptanu
    Ororbia, Alex
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (03) : 613 - 630
  • [7] Towards Automatic Detection and Explanation of Hate Speech and Offensive Language
    Dorris, Wyatt
    Hu, Ruijia
    Vishwamitra, Nishant
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE SIXTH INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS (IWSPA'20), 2020, : 23 - 29
  • [8] Towards Accurate Detection of Offensive Language in Online Communication in Arabic
    Alakrot, Azalden
    Murray, Liam
    Nikolov, Nikola S.
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 315 - 320
  • [9] Offensive language
    Barreda, Rene
    FORTUNE, 2007, 155 (03) : 13 - 13
  • [10] OFFENSIVE LANGUAGE
    DIDION, C
    TRAINING AND DEVELOPMENT JOURNAL, 1986, 40 (09): : 6 - 6