Development of an Efficient Method to Detect Mixed Social Media Data with Tamil-English Code Using Machine Learning Techniques

被引:3
|
作者
Fha, Shibly [1 ,2 ]
Sharma, Uzzal [1 ]
Naleer, Hmm [3 ]
机构
[1] Assam Don Bosco Univ, Gauhati, India
[2] South Eastern Univ Sri Lanka, Oluvil, Sri Lanka
[3] South Eastern Univ Sri Lanka, Fac Appl Sci, Dept Comp Sci, Oluvil, Sri Lanka
关键词
Tamil; English; code mixed; hate speech; machine learning and ensemble classification;
D O I
10.1145/3563775
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
On social networking sites, online hate speech has become more prevalent due to the quick expansion of mobile computing and Web technology. Previous research has found that being exposed to Internet hate speech has substantial offline implications for historically disadvantaged communities. Therefore, there is a lot of interest in research on automated hate-based comment and post detection. Hate speech can have an influence on any population group, but some are more vulnerable than others. From this background, detecting and reporting such hate related comments and posts can help to avoid the harmful effects of hate speech. There are some studies available on this context and it was found that machine learning algorithms are more efficient in detecting abusive texts in social media. In this research, we applied selected seven machine learning algorithms such as Support Vector Machine (SVM), Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Gradient Boost (GB) and K Nearest Neighbor (KNN) to detect hate speech and compare the performances of those algorithms to develop an ensemble model. Researchers collected and combined Tamil - English code-mixed hate speech tweets dataset which was created in HASOC. This dataset's tweets are divided into two groups: not offensive and offensive. This dataset includes 35,442 tweets. In this research, NB has obtained highest F1 scores in detecting offensive and not offensive tweets with highest weighted average. But SVM has obtained highest accuracy in detecting Tamil - English hate speech texts with 80% in 10-fold cross-validation. Based on the stand-alone performances, researchers developed two ensemble classifiers including max-voting and averaging ensemble. Averaging ensemble classification obtained 90.67% in accuracy. The research study's findings are significant because these results can be applied as a model for Tamil - English code-mixed hate speech to evaluate future research works using various algorithms for identifying hate contents more accurately and professionally.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
    Mohapatra, Sudhir Kumar
    Prasad, Srinivas
    Bebarta, Dwiti Krishna
    Das, Tapan Kumar
    Srinivasan, Kathiravan
    Hu, Yuh-Chung
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [2] Social Media Mining to Detect Online Violent Extremism using Machine Learning Techniques
    Mussiraliyeva, Shynar
    Bagitova, Kalamkas
    Sultan, Daniyar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 1384 - 1393
  • [3] Sentiment Analysis and Offensive Language Identification in Code-Mixed Tamil-English Languages Using Transformer-Based Models
    Ponnambalam, Satheesh Kumar
    Desai, Darshana
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 149 - 167
  • [4] Stress Recognition in Code-Mixed Social Media Texts using Machine Learning
    Achamaleh, Tewodros
    Eyob, Lemlem
    Tayyab, Muhammad
    Sidorov, Grigori
    Batyrshin, Ildar
    INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2024, 15 (01): : 32 - 38
  • [5] An analysis of machine learning models for sentiment analysis of Tamil code-mixed data
    Shanmugavadivel, Kogilavani
    Sampath, Sai Haritha
    Nandhakumar, Pramod
    Mahalingam, Prasath
    Subramanian, Malliga
    Kumaresan, Prasanna Kumar
    Priyadharshini, Ruba
    COMPUTER SPEECH AND LANGUAGE, 2022, 76
  • [6] Social media text analytics of Malayalam–English code-mixed using deep learning
    S. Thara
    Prabaharan Poornachandran
    Journal of Big Data, 9
  • [7] Social media text analytics of Malayalam-English code-mixed using deep learning
    Thara, S.
    Poornachandran, Prabaharan
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [8] Rumor Detection Using Machine Learning Techniques on Social Media
    Kumar, Akshi
    Sangwan, Saurabh Raj
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, VOL 2, 2019, 56 : 213 - 221
  • [9] Efficient English text classification using selected Machine Learning Techniques
    Luo, Xiaoyu
    ALEXANDRIA ENGINEERING JOURNAL, 2021, 60 (03) : 3401 - 3409
  • [10] Social media data analysis to predict mental state of users using machine learning techniques
    Lokeshkumar, R.
    Mishra, Om Ashish
    Kalra, Shivam
    JOURNAL OF EDUCATION AND HEALTH PROMOTION, 2021, 10 (01)