SSL-GAN-RoBERTa: A robust semi-supervised model for detecting Anti-Asian COVID-19 hate speech on social media

被引：4

作者：

Su, Xuanyu ^{[1
]}

Li, Yansong ^{[1
]}

Branco, Paula ^{[1
]}

Inkpen, Diana ^{[1
]}

机构：

[1] Univ Ottawa, Sch Elect Engn & Comp Sci, Ottawa, ON, Canada

来源：

NATURAL LANGUAGE ENGINEERING | 2023年 / 30卷 / 06期

关键词：

Hate speech detection; Deep learning; Semi-supervised learning;

D O I：

10.1017/S1351324923000396

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Anti-Asian speech during the COVID-19 pandemic has been a serious problem with severe consequences. A hate speech wave swept social media platforms. The timely detection of Anti-Asian COVID-19-related hate speech is of utmost importance, not only to allow the application of preventive mechanisms but also to anticipate and possibly prevent other similar discriminatory situations. In this paper, we address the problem of detecting Anti-Asian COVID-19-related hate speech from social media data. Previous approaches that tackled this problem used a transformer-based model, BERT/RoBERTa, trained on the homologous annotated dataset and achieved good performance on this task. However, this requires extensive and annotated datasets with a strong connection to the topic. Both goals are difficult to meet without employing reliable, vast, and costly resources. In this paper, we propose a robust semi-supervised model, SSL-GAN-RoBERTa, that learns from a limited heterogeneous dataset and whose performance is further enhanced by using vast amounts of unlabeled data from another related domain. Compared with the RoBERTa baseline model, the experimental results show that the model has substantial performance gains in terms of Accuracy and Macro-F1 score in different scenarios that use data from different domains. Our proposed model achieves state-of-the-art performance results while efficiently using unlabeled data, showing promising applicability to other complex classification tasks where large amounts of labeled examples are difficult to obtain.

引用

页码：1161 / 1180

页数：20

共 9 条

[1] Racism is a Virus: Anti-Asian Hate and Counterspeech in Social Media during the COVID-19 Crisis
He, Bing
Ziems, Caleb
Soni, Sandeep
Ramakrishnan, Naren
Yang, Diyi
Kumar, Srijan
PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 90 - 94
[2] Semi-Supervised Machine Learning for Analyzing COVID-19 Related Twitter Data for Asian Hate Speech
Richardson, Caitlin
Shah, Sandeep
Yuan, Xiaohong
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1643 - 1648
[3] Anti-Asian Media Labeling in the COVID-19 Pandemic: The Role of Social Identity and Information Accuracy
Sadri, Sean R. R.
Billings, Andrew C. C.
Hakim, Samuel D. D.
HOWARD JOURNAL OF COMMUNICATIONS, 2024, 35 (02) : 233 - 252
[4] Model Minority Mutiny: addressing anti-Asian racism during the COVID-19 pandemic in social work
Maglalang, Dale Dagar
Rao, Smitha
Woo, Bongki
Wang, Kaipeng
JOURNAL OF ETHNIC & CULTURAL DIVERSITY IN SOCIAL WORK, 2022, 31 (3-5): : 292 - 301
[5] Visible violence, invisible voices: media frameworks of anti-Asian hate in San Francisco and St. Louis during the COVID-19 pandemic
Ramesh, Nithila
SOCIOLOGICAL SPECTRUM, 2024, 44 : S27 - S27
[6] GIS-based analysis of anti-Asian hate speech and its socioeconomic and ideological drivers in the United States during the early COVID-19 pandemic
Chia-Yu Wu
Shao-Yun Chang
Li-Yin Liu
Alexander Hohl
Wu, Chia-Yu (cwu001@udayton.edu), 2025, 90 (01)
[7] Exploring Anti-Asian Racism Activism on Twitter during the Early Era of COVID-19 Hate Crimes: Implications for Marketers' Social Purpose Communication Strategy
Lee, Yoon-Joo
Haley, Eric
Shang, Yuanyuan
JOURNAL OF CURRENT ISSUES AND RESEARCH IN ADVERTISING, 2024, 45 (01): : 88 - 111
[8] Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts
Md Abul Bashar
Richi Nayak
Khanh Luong
Thirunavukarasu Balasubramaniam
Social Network Analysis and Mining, 2021, 11
[9] Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts
Abul Bashar, Md
Nayak, Richi
Luong, Khanh
Balasubramaniam, Thirunavukarasu
SOCIAL NETWORK ANALYSIS AND MINING, 2021, 11 (01)

← 1 →