Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks

被引:0
|
作者
Jarquín-Vásquez H. [1 ]
Escalante H.J. [1 ]
Montes-y-Gómez M. [1 ]
机构
[1] Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1, Sta María Tonanzintla, Puebla, San Andrés Cholula
关键词
Abusive language; Attention mechanisms; Pretraining tasks; Transformer models;
D O I
10.1016/j.patrec.2024.05.007
中图分类号
学科分类号
摘要
The widespread adoption of deep learning approaches in natural language processing is largely attributed to their exceptional performance across diverse tasks. Notably, Transformer-based models, such as BERT, have gained popularity for their remarkable efficacy and their ease of adaptation (via fine-tuning) across various domains. Despite their success, fine-tuning these models for informal language, particularly instances involving offensive expressions, presents a major challenge due to limitations in vocabulary coverage and contextual information for such tasks. To address these challenges, we propose the domain adaptation of the BERT language model for the task of detecting abusive language. Our approach involves constraining the language model with the adaptation and paradigm shift of two default pre-trained tasks, the design of two datasets specifically engineered to support the adapted pre-training tasks, and the proposal of a dynamic weighting loss function. The evaluation of these adapted configurations on six datasets dedicated to abusive language detection reveals promising outcomes, with a significant enhancement observed compared to the base model. Furthermore, our proposed methods yield competitive results when compared to state-of-the-art approaches, establishing a robust and easily trainable model for the effective identification of abusive language. © 2024 Elsevier B.V.
引用
收藏
页码:361 / 368
相关论文
共 50 条
  • [41] Pre-training Tasks for User Intent Detection and Embedding Retrieval in E-commerce Search
    Qiu, Yiming
    Zhao, Chenyu
    Zhang, Han
    Zhuo, Jingwei
    Li, Tianhao
    Zhang, Xiaowei
    Wang, Songlin
    Xu, Sulong
    Long, Bo
    Yang, Wen-Yun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4424 - 4428
  • [42] CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising*
    Luo, Jianjie
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5600 - 5608
  • [43] Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science
    Trewartha, Amalie
    Walker, Nicholas
    Huo, Haoyan
    Lee, Sanghoon
    Cruse, Kevin
    Dagdelen, John
    Dunn, Alexander
    Persson, Kristin A.
    Ceder, Gerbrand
    Jain, Anubhav
    PATTERNS, 2022, 3 (04):
  • [44] Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training
    Zhang, Wenyu
    Shen, Li
    Foo, Chuan-Sheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, : 844 - 866
  • [45] Pre-training language model incorporating domain-specific heterogeneous knowledge into a unified representation
    Zhu, Hongyin
    Peng, Hao
    Lyu, Zhiheng
    Hou, Lei
    Li, Juanzi
    Xiao, Jinghui
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 215
  • [46] Multimodal detection of hateful memes by applying a vision-language pre-training model
    Chen, Yuyang
    Pan, Feng
    PLOS ONE, 2022, 17 (09):
  • [47] Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts
    Wang, Alex Jinpeng
    Zhou, Pan
    Shou, Mike Zheng
    Yan, Shuicheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3406 - 3421
  • [48] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
    Yuan, Hangjie
    Jiang, Jianwen
    Albanie, Samuel
    Feng, Tao
    Huang, Ziyuan
    Ni, Dong
    Tang, Mingqian
    Advances in Neural Information Processing Systems, 2022, 35
  • [49] Leveraging Concept-Enhanced Pre-Training Model and Masked-Entity Language Model for Named Entity Disambiguation
    Ji, Zizheng
    Dai, Lin
    Pang, Jin
    Shen, Tingting
    IEEE ACCESS, 2020, 8 : 100469 - 100484
  • [50] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
    Yuan, Hangjie
    Jiang, Jianwen
    Albanie, Samuel
    Feng, Tao
    Huang, Ziyuan
    Ni, Dong
    Tang, Mingqian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,