Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks

被引:0
|
作者
Jarquín-Vásquez H. [1 ]
Escalante H.J. [1 ]
Montes-y-Gómez M. [1 ]
机构
[1] Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1, Sta María Tonanzintla, Puebla, San Andrés Cholula
关键词
Abusive language; Attention mechanisms; Pretraining tasks; Transformer models;
D O I
10.1016/j.patrec.2024.05.007
中图分类号
学科分类号
摘要
The widespread adoption of deep learning approaches in natural language processing is largely attributed to their exceptional performance across diverse tasks. Notably, Transformer-based models, such as BERT, have gained popularity for their remarkable efficacy and their ease of adaptation (via fine-tuning) across various domains. Despite their success, fine-tuning these models for informal language, particularly instances involving offensive expressions, presents a major challenge due to limitations in vocabulary coverage and contextual information for such tasks. To address these challenges, we propose the domain adaptation of the BERT language model for the task of detecting abusive language. Our approach involves constraining the language model with the adaptation and paradigm shift of two default pre-trained tasks, the design of two datasets specifically engineered to support the adapted pre-training tasks, and the proposal of a dynamic weighting loss function. The evaluation of these adapted configurations on six datasets dedicated to abusive language detection reveals promising outcomes, with a significant enhancement observed compared to the base model. Furthermore, our proposed methods yield competitive results when compared to state-of-the-art approaches, establishing a robust and easily trainable model for the effective identification of abusive language. © 2024 Elsevier B.V.
引用
下载
收藏
页码:361 / 368
相关论文
共 50 条
  • [1] Dict-BERT: Enhancing Language Model Pre-training with Dictionary
    Yu, Wenhao
    Zhu, Chenguang
    Fang, Yuwei
    Yu, Donghan
    Wang, Shuohang
    Xu, Yichong
    Zeng, Michael
    Jiang, Meng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1907 - 1918
  • [2] Improving the Identification of Abusive Language Through Careful Design of Pre-training Tasks
    Jarquin-Vasquez, Horacio
    Jair Escalante, Hugo
    Montes-y-Gomez, Manuel
    PATTERN RECOGNITION, MCPR 2023, 2023, 13902 : 283 - 292
  • [3] Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
    Zhuge, Mingchen
    Gao, Dehong
    Fan, Deng-Ping
    Jin, Linbo
    Chen, Ben
    Zhou, Haoming
    Qiu, Minghui
    Shao, Ling
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12642 - 12652
  • [4] A Domain-adaptive Pre-training Approach for Language Bias Detection in News
    Krieger, Jan-David
    Spinde, Timo
    Ruas, Terry
    Kulshrestha, Juhi
    Gipp, Bela
    2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
  • [5] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Devlin, Jacob
    Chang, Ming-Wei
    Lee, Kenton
    Toutanova, Kristina
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4171 - 4186
  • [6] Pre-Training BERT on Domain Resources for Short Answer Grading
    Sung, Chul
    Dhamecha, Tejas Indulal
    Saha, Swarnadeep
    Ma, Tengfei
    Reddy, Vinay
    Arora, Rishi
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6071 - 6075
  • [7] Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish
    Rodriguez, Dalia A.
    Diaz-Escobar, Julia
    Diaz-Ramirez, Arnoldo
    Trujillo, Leonardo
    SOCIAL NETWORK ANALYSIS AND MINING, 2023, 13 (01)
  • [8] Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish
    Dalia A. Rodríguez
    Julia Diaz-Escobar
    Arnoldo Díaz-Ramírez
    Leonardo Trujillo
    Social Network Analysis and Mining, 13
  • [9] MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain
    Syed, Muzamil Hussain
    Chung, Sun-Tae
    APPLIED SCIENCES-BASEL, 2021, 11 (13):
  • [10] Enhancing medical text detection with vision-language pre-training and efficient segmentation
    Li, Tianyang
    Bai, Jinxu
    Wang, Qingzhu
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3995 - 4007