Resources and benchmark corpora for hate speech detection: a systematic review

被引:153
|
作者
Poletto, Fabio [1 ]
Basile, Valerio [1 ]
Sanguinetti, Manuela [1 ]
Bosco, Cristina [1 ]
Patti, Viviana [1 ]
机构
[1] Univ Turin, Turin, Italy
关键词
Hate speech detection; Benchmark corpora; Natural Language Processing shared tasks; Systematic review;
D O I
10.1007/s10579-020-09502-8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Hate Speech in social media is a complex phenomenon, whose detection has recently gained significant traction in the Natural Language Processing community, as attested by several recent review works. Annotated corpora and benchmarks are key resources, considering the vast number of supervised approaches that have been proposed. Lexica play an important role as well for the development of hate speech detection systems. In this review, we systematically analyze the resources made available by the community at large, including their development methodology, topical focus, language coverage, and other factors. The results of our analysis highlight a heterogeneous, growing landscape, marked by several issues and venues for improvement.
引用
收藏
页码:477 / 523
页数:47
相关论文
共 50 条
  • [1] Resources and benchmark corpora for hate speech detection: a systematic review
    Fabio Poletto
    Valerio Basile
    Manuela Sanguinetti
    Cristina Bosco
    Viviana Patti
    [J]. Language Resources and Evaluation, 2021, 55 : 477 - 523
  • [2] Systematic Literature Review Of Hate Speech Detection With Text Mining
    Rini
    Utami, Ema
    Hartanto, Anggit Dwi
    [J]. PROCEEDINGS OF ICORIS 2020: 2020 THE 2ND INTERNATIONAL CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEM (ICORIS), 2020, : 228 - 233
  • [3] HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
    Mathew, Binny
    Saha, Punyajoy
    Yimam, Seid Muhie
    Biemann, Chris
    Goyal, Pawan
    Mukherjee, Animesh
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14867 - 14875
  • [4] Detection of fake news and hate speech for Ethiopian languages: a systematic review of the approaches
    Wubetu Barud Demilie
    Ayodeji Olalekan Salau
    [J]. Journal of Big Data, 9
  • [5] A systematic review of hate speech automatic detection using natural language processing
    Jahan, Md Saroar
    Oussalah, Mourad
    [J]. NEUROCOMPUTING, 2023, 546
  • [6] Detection of fake news and hate speech for Ethiopian languages: a systematic review of the approaches
    Demilie, Wubetu Barud
    Salau, Ayodeji Olalekan
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [7] Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection
    Bose, Tulika
    Aletras, Nikolaos
    Illina, Irina
    Fohr, Dominique
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 372 - 382
  • [8] Systematic keyword and bias analyses in hate speech detection
    Sarracen, Gretel Liz De la Pella
    Rosso, Paolo
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (05)
  • [9] Twitter Hate Speech Detection: A Systematic Review of Methods, Taxonomy Analysis, Challenges, and Opportunities
    Mansur, Zainab
    Omar, Nazlia
    Tiun, Sabrina
    [J]. IEEE ACCESS, 2023, 11 : 16226 - 16249
  • [10] Modern Standard Arabic Speech Corpora: A Systematic Review
    Alqadasi, Ammar Mohammed Ali
    Abdulghafor, Rawad
    Sunar, Mohd Shahrizal
    Salam, Md. Sah Bin H. J.
    [J]. IEEE ACCESS, 2023, 11 : 55771 - 55796