Towards Generalized Offensive Language Identification

被引:0
|
作者
Dmonte, Alphaeus [1 ]
Arya, Tejas [2 ]
Ranasinghe, Tharindu [3 ]
Zampieri, Marcos [1 ]
机构
[1] George Mason Univ, Fairfax, VA 22030 USA
[2] Rochester Inst Technol, Rochester, NY USA
[3] Univ Lancaster, Lancaster, England
关键词
Offensive Language; Large Language Models; Generalizability;
D O I
10.1007/978-3-031-78541-2_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed to automatically identify potentially harmful content and to mitigate its impact. These systems can follow two approaches; (i) Use publicly available models and application endpoints, including prompting large language models (LLMs) (ii) Annotate datasets and train ML models on them. However, both approaches lack an understanding of how generalizable they are. Furthermore, the applicability of these systems is often questioned in off-domain and practical environments. This paper empirically evaluates the generalizability of offensive language detection models and datasets across a novel generalized benchmark: GenOffense. We answer three research questions on generalizability. Our findings will be useful in creating robust real-world offensive language detection systems.
引用
收藏
页码:271 / 286
页数:16
相关论文
共 50 条
  • [41] Elevating Offensive Language Detection: CNN-GRU and BERT for Enhanced Hate Speech Identification
    Madhavi, M.
    Agal, Sanjay
    Odedra, Niyati Dhirubhai
    Chowdhary, Harish
    Ruprah, Taranpreet Singh
    Vuyyuru, Veera Ankalu
    El-Ebiary, Yousef A. Baker
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 1164 - 1172
  • [42] Detection of Offensive Language and ITS Severity for Low Resource Language
    Saeed, Ramsha
    Afzal, Hammad
    Rauf, Sadaf Abdul
    Iltaf, Naima
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [43] A Multi-Architecture Approach for Offensive Language Identification Combining Classical Natural Language Processing and BERT-Variant Models
    Yadav, Ashok
    Khan, Farrukh Aslam
    Singh, Vrijendra
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [44] Offensive Language Detection in Nepali Social Media
    Niraula, Nobal B.
    Dulal, Saurab
    Koirala, Diwa
    WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS, 2021, : 67 - 75
  • [45] On the effects of machine translation on offensive language detection
    Dmonte, Alphaeus
    Satapara, Shrey
    Alsudais, Rehab
    Ranasinghe, Tharindu
    Zampieri, Marcos
    SOCIAL NETWORK ANALYSIS AND MINING, 2025, 14 (01)
  • [46] Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data
    Shanmugavadivel, Kogilavani
    Sathishkumar, V. E.
    Raja, Sandhiya
    Lingaiah, T. Bheema
    Neelakandan, S.
    Subramanian, Malliga
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [47] Offensive Language: Taboo, Offence and Social Control
    Zhuo, Tianying
    Ying, Hongying
    INTERNET PRAGMATICS, 2024, 7 (02): : 326 - 330
  • [48] DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
    Chakravarthi, Bharathi Raja
    Priyadharshini, Ruba
    Muralidaran, Vigneshwaran
    Jose, Navya
    Suryawanshi, Shardul
    Sherly, Elizabeth
    McCrae, John P.
    LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 765 - 806
  • [49] DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
    Bharathi Raja Chakravarthi
    Ruba Priyadharshini
    Vigneshwaran Muralidaran
    Navya Jose
    Shardul Suryawanshi
    Elizabeth Sherly
    John P. McCrae
    Language Resources and Evaluation, 2022, 56 : 765 - 806
  • [50] Offensive Language: Taboo, Offence and Social Control
    Jay, Timothy B.
    JOURNAL OF MULTILINGUAL AND MULTICULTURAL DEVELOPMENT, 2023, 44 (01) : 80 - 82