Enhancing abusive language detection: A domain-adapted approach leveraging BERT pre-training tasks

被引：0

作者：

Jarquín-Vásquez H. ^{[1
]}

Escalante H.J. ^{[1
]}

Montes-y-Gómez M. ^{[1
]}

机构：

[1] Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro #1, Sta María Tonanzintla, Puebla, San Andrés Cholula

来源：

Pattern Recognition Letters | 2024年 / 186卷

关键词：

Abusive language; Attention mechanisms; Pretraining tasks; Transformer models;

D O I：

10.1016/j.patrec.2024.05.007

中图分类号：

学科分类号：

摘要：

The widespread adoption of deep learning approaches in natural language processing is largely attributed to their exceptional performance across diverse tasks. Notably, Transformer-based models, such as BERT, have gained popularity for their remarkable efficacy and their ease of adaptation (via fine-tuning) across various domains. Despite their success, fine-tuning these models for informal language, particularly instances involving offensive expressions, presents a major challenge due to limitations in vocabulary coverage and contextual information for such tasks. To address these challenges, we propose the domain adaptation of the BERT language model for the task of detecting abusive language. Our approach involves constraining the language model with the adaptation and paradigm shift of two default pre-trained tasks, the design of two datasets specifically engineered to support the adapted pre-training tasks, and the proposal of a dynamic weighting loss function. The evaluation of these adapted configurations on six datasets dedicated to abusive language detection reveals promising outcomes, with a significant enhancement observed compared to the base model. Furthermore, our proposed methods yield competitive results when compared to state-of-the-art approaches, establishing a robust and easily trainable model for the effective identification of abusive language. © 2024 Elsevier B.V.

引用

下载

页码：361 / 368

共 50 条

[1] Dict-BERT: Enhancing Language Model Pre-training with Dictionary
Yu, Wenhao
Zhu, Chenguang
Fang, Yuwei
Yu, Donghan
Wang, Shuohang
Xu, Yichong
Zeng, Michael
Jiang, Meng
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 1907 - 1918
[2] Improving the Identification of Abusive Language Through Careful Design of Pre-training Tasks
Jarquin-Vasquez, Horacio
Jair Escalante, Hugo
Montes-y-Gomez, Manuel
PATTERN RECOGNITION, MCPR 2023, 2023, 13902 : 283 - 292
[3] Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
Zhuge, Mingchen
Gao, Dehong
Fan, Deng-Ping
Jin, Linbo
Chen, Ben
Zhou, Haoming
Qiu, Minghui
Shao, Ling
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12642 - 12652
[4] A Domain-adaptive Pre-training Approach for Language Bias Detection in News
Krieger, Jan-David
Spinde, Timo
Ruas, Terry
Kulshrestha, Juhi
Gipp, Bela
2022 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL), 2022,
[5] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Jacob
Chang, Ming-Wei
Lee, Kenton
Toutanova, Kristina
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4171 - 4186
[6] Pre-Training BERT on Domain Resources for Short Answer Grading
Sung, Chul
Dhamecha, Tejas Indulal
Saha, Swarnadeep
Ma, Tengfei
Reddy, Vinay
Arora, Rishi
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6071 - 6075
[7] Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish
Rodriguez, Dalia A.
Diaz-Escobar, Julia
Diaz-Ramirez, Arnoldo
Trujillo, Leonardo
SOCIAL NETWORK ANALYSIS AND MINING, 2023, 13 (01)
[8] Domain-adaptive pre-training on a BERT model for the automatic detection of misogynistic tweets in Spanish
Dalia A. Rodríguez
Julia Diaz-Escobar
Arnoldo Díaz-Ramírez
Leonardo Trujillo
Social Network Analysis and Mining, 13
[9] MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain
Syed, Muzamil Hussain
Chung, Sun-Tae
APPLIED SCIENCES-BASEL, 2021, 11 (13):
[10] Enhancing medical text detection with vision-language pre-training and efficient segmentation
Li, Tianyang
Bai, Jinxu
Wang, Qingzhu
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3995 - 4007

← 1 2 3 4 5 →