Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks

被引:0
|
作者
Jarquín-Vásquez H. [1 ]
Escalante H.J. [1 ]
Montes-y-Gómez M. [1 ]
机构
[1] Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1, Sta María Tonanzintla, Puebla, San Andrés Cholula
关键词
Abusive language; Attention mechanisms; Pretraining tasks; Transformer models;
D O I
10.1016/j.patrec.2024.05.007
中图分类号
学科分类号
摘要
The widespread adoption of deep learning approaches in natural language processing is largely attributed to their exceptional performance across diverse tasks. Notably, Transformer-based models, such as BERT, have gained popularity for their remarkable efficacy and their ease of adaptation (via fine-tuning) across various domains. Despite their success, fine-tuning these models for informal language, particularly instances involving offensive expressions, presents a major challenge due to limitations in vocabulary coverage and contextual information for such tasks. To address these challenges, we propose the domain adaptation of the BERT language model for the task of detecting abusive language. Our approach involves constraining the language model with the adaptation and paradigm shift of two default pre-trained tasks, the design of two datasets specifically engineered to support the adapted pre-training tasks, and the proposal of a dynamic weighting loss function. The evaluation of these adapted configurations on six datasets dedicated to abusive language detection reveals promising outcomes, with a significant enhancement observed compared to the base model. Furthermore, our proposed methods yield competitive results when compared to state-of-the-art approaches, establishing a robust and easily trainable model for the effective identification of abusive language. © 2024 Elsevier B.V.
引用
下载
收藏
页码:361 / 368
相关论文
共 50 条
  • [21] Leveraging per Image-Token Consistency for Vision-Language Pre-training
    Gou, Yunhao
    Ko, Tom
    Yang, Hansi
    Kwok, James
    Zhang, Yu
    Wang, Mingxuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19155 - 19164
  • [22] Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
    Xue, Chuhui
    Zhang, Wenqing
    Hao, Yu
    Lu, Shijian
    Torr, Philip H. S.
    Bai, Song
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 284 - 302
  • [23] ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
    Wang, Weihan
    Yang, Zhen
    Xu, Bin
    Li, Juanzi
    Sun, Yankui
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3135 - 3146
  • [24] Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks
    Huang, Haoyang
    Liang, Yaobo
    Duan, Nan
    Gong, Ming
    Shou, Linjun
    Jiang, Daxin
    Zhou, Ming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2485 - 2494
  • [25] Efficient learning for spoken language understanding tasks with word embedding based pre-training
    Luan, Yi
    Watanabe, Shinji
    Harsham, Bret
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
  • [26] ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks
    Pelloin, Valentin
    Dary, Franck
    Herve, Nicolas
    Favre, Benoit
    Camelin, Nathalie
    Laurent, Antoine
    Besacier, Laurent
    INTERSPEECH 2022, 2022, : 3453 - 3457
  • [27] Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records
    Ding, Junqi
    Li, Bo
    Xu, Chang
    Qiao, Yan
    Zhang, Lingxian
    APPLIED INTELLIGENCE, 2023, 53 (12) : 15979 - 15992
  • [28] MindLLM: Lightweight large language model pre-training, evaluation and domain application
    Yang, Yizhe
    Sun, Huashan
    Li, Jiawei
    Liu, Runheng
    Li, Yinghao
    Liu, Yuhang
    Gao, Yang
    Huang, Heyan
    AI Open, 2024, 5 : 1 - 26
  • [29] Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records
    Junqi Ding
    Bo Li
    Chang Xu
    Yan Qiao
    Lingxian Zhang
    Applied Intelligence, 2023, 53 : 15979 - 15992
  • [30] Re-train or Train from Scratch? Comparing Pre-training Strategies of BERT in the Medical Domain
    El Boukkouri, Hicham
    Ferret, Olivier
    Lavergne, Thomas
    Zweigenbaum, Pierre
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2626 - 2633