Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks

被引:0
|
作者
Jarquín-Vásquez H. [1 ]
Escalante H.J. [1 ]
Montes-y-Gómez M. [1 ]
机构
[1] Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1, Sta María Tonanzintla, Puebla, San Andrés Cholula
关键词
Abusive language; Attention mechanisms; Pretraining tasks; Transformer models;
D O I
10.1016/j.patrec.2024.05.007
中图分类号
学科分类号
摘要
The widespread adoption of deep learning approaches in natural language processing is largely attributed to their exceptional performance across diverse tasks. Notably, Transformer-based models, such as BERT, have gained popularity for their remarkable efficacy and their ease of adaptation (via fine-tuning) across various domains. Despite their success, fine-tuning these models for informal language, particularly instances involving offensive expressions, presents a major challenge due to limitations in vocabulary coverage and contextual information for such tasks. To address these challenges, we propose the domain adaptation of the BERT language model for the task of detecting abusive language. Our approach involves constraining the language model with the adaptation and paradigm shift of two default pre-trained tasks, the design of two datasets specifically engineered to support the adapted pre-training tasks, and the proposal of a dynamic weighting loss function. The evaluation of these adapted configurations on six datasets dedicated to abusive language detection reveals promising outcomes, with a significant enhancement observed compared to the base model. Furthermore, our proposed methods yield competitive results when compared to state-of-the-art approaches, establishing a robust and easily trainable model for the effective identification of abusive language. © 2024 Elsevier B.V.
引用
下载
收藏
页码:361 / 368
相关论文
共 50 条
  • [31] Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
    Chen, Zhihong
    Li, Guanbin
    Wan, Xiang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5152 - 5161
  • [32] Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
    Liu, Zikang
    Chen, Sihan
    Guo, Longteng
    Li, Handong
    He, Xingjian
    Liu, Jing
    arXiv, 2023,
  • [33] Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
    Liu, Zikang
    Chen, Sihan
    Guo, Longteng
    Li, Handong
    He, Xingjian
    Liu, Jing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5120 - 5131
  • [34] Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
    Liu, Zikang
    Chen, Sihan
    Guo, Longteng
    Li, Handong
    He, Xingjian
    Liu, Jing
    MM 2023 - Proceedings of the 31st ACM International Conference on Multimedia, 2023, : 5120 - 5131
  • [35] ST-BERT: CROSS-MODAL LANGUAGE MODEL PRE-TRAINING FOR END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Kim, Minjeong
    Kim, Gyuwan
    Lee, Sang-Woo
    Ha, Jung-Woo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7478 - 7482
  • [36] QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search
    Xie, Jian
    Liang, Yidan
    Liu, Jingping
    Xiao, Yanghua
    Wu, Baohua
    Ni, Shenghua
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5282 - 5291
  • [37] Domain-Specific Language Model Pre-Training for Korean Tax Law Classification
    Gu, Yeong Hyeon
    Piao, Xianghua
    Yin, Helin
    Jin, Dong
    Zheng, Ri
    Yoo, Seong Joon
    IEEE ACCESS, 2022, 10 : 46342 - 46353
  • [38] An Empirical Investigation Towards Efficient Multi-Domain Language Model Pre-training
    Arumae, Kristjan
    Sun, Qing
    Bhatia, Parminder
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4854 - 4864
  • [39] Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting
    Liu, Dong
    Mao, Qirong
    Gao, Lijian
    Wang, Gang
    Engineering Applications of Artificial Intelligence, 2024, 138
  • [40] Bridging the Gap between Recognition-level Pre-training and Commonsensical Vision-language Tasks
    Wan, Yue
    Ma, Yueen
    You, Haoxuan
    Wang, Zhecan
    Chang, Shih-Fu
    PROCEEDINGS OF THE FIRST WORKSHOP ON COMMONSENSE REPRESENTATION AND REASONING (CSRR 2022), 2022, : 23 - 35