Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions

被引:38
|
作者
Ayo, Femi Emmanuel [1 ]
Folorunso, Olusegun [2 ]
Ibharalu, Friday Thomas [2 ]
Osinuga, Idowu Ademola [3 ]
机构
[1] McPherson Univ, Dept Phys & Comp Sci, Seriki Sotayo, Ogun State, Nigeria
[2] Fed Univ Agr, Dept Comp Sci, Abeokuta, Ogun State, Nigeria
[3] Fed Univ Agr, Dept Math, Abeokuta, Ogun State, Nigeria
关键词
Twitter data stream; Hate speech; Detection; Fuzzy logic; Bayesian network; Combinatorial algorithm; SENTIMENT ANALYSIS; OPPORTUNITIES; BEHAVIOR;
D O I
10.1016/j.cosrev.2020.100311
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter is a microblogging tool that allow the creation of big data through short digital contents. This study provides a survey of machine learning techniques for hate speech classification from Twitter data streams. Hate speech classification in Twitter data streams has remain a vibrant research focus, but little research efforts have been devoted to the design of a generic metadata architecture, threshold settings and fragmentation issues. Hate speech classification techniques presented in literature address some of the challenges inherent in Twitter data streams but limited in the aforementioned issues. This study presented collection of hate speech benchmarks datasets suitable for testing the efficiency of classification models. This study also presented the pros and cons for single and hybrid machine learning methods in hate speech classification. The summary of performance evaluation for the surveyed machine learning methods was also presented. The study also presented a generic metadata architecture for hate speech classification in Twitter to tackle issues with Twitter data streams. The developed generic metadata architecture was observed to performed better across all evaluation metrics for hate speech detection having 0.95, 0.93, 0.92 and 0.93 for accuracy, precision, recall and F1-score respectively, when compared to similar methods. Similarly, the developed generic metadata architecture for hate speech sentiment classification performed better with F1-score of 91.5% compared to related methods. The developed generic metadata architecture also indicates a more perfect test having an AUC of 0.97, when compared to similar methods. The statistical validation of results points out the efficiency of the developed system. Finally, the results also showed that the developed system is very good for automatic topic detection and categorization. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Computational intelligence approaches for classification of medical data: State-of-the-art, future challenges and research directions
    Kalantari, Ali
    Kamsin, Amirrudin
    Shamshirband, Shahaboddin
    Gani, Abdullah
    Alinejad-Rokny, Hamid
    Chronopoulos, Anthony T.
    [J]. NEUROCOMPUTING, 2018, 276 : 2 - 22
  • [2] A systematic literature review of hate speech identification on Arabic Twitter data: research challenges and future directions
    Alhazmi, Ali
    Mahmud, Rohana
    Idris, Norisma
    Abo, Mohamed Elhag Mohamed
    Eke, Christopher
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [3] Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions
    Parampreet Kaur
    Ashima Singh
    Inderveer Chana
    [J]. Archives of Computational Methods in Engineering, 2021, 28 : 4595 - 4631
  • [4] Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions
    Kaur, Parampreet
    Singh, Ashima
    Chana, Inderveer
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2021, 28 (07) : 4595 - 4631
  • [5] PATHOLOGICAL SPEECH PROCESSING: STATE-OF-THE-ART, CURRENT CHALLENGES, AND FUTURE DIRECTIONS
    Gupta, Rahul
    Chaspari, Theodora
    Kim, Jangwon
    Kumar, Naveen
    Bone, Daniel
    Narayanan, Shrikanth
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6470 - 6474
  • [6] Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions
    Holzinger, Andreas
    Dehmer, Matthias
    Jurisica, Igor
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [7] Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions
    Andreas Holzinger
    Matthias Dehmer
    Igor Jurisica
    [J]. BMC Bioinformatics, 15
  • [8] A Survey on Active Learning: State-of-the-Art, Practical Challenges and Research Directions
    Tharwat, Alaa
    Schenck, Wolfram
    [J]. MATHEMATICS, 2023, 11 (04)
  • [9] Machine learning in the quantum realm: The state-of-the-art, challenges, and future vision
    Houssein, Essam H.
    Abohashima, Zainab
    Elhoseny, Mohamed
    Mohamed, Waleed M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 194
  • [10] Deep Learning on Mobile and Embedded Devices: State-of-the-art, Challenges, and Future Directions
    Chen, Yanjiao
    Zheng, Baolin
    Zhang, Zihan
    Wang, Qian
    Shen, Chao
    Zhang, Qian
    [J]. ACM COMPUTING SURVEYS, 2020, 53 (04)