Machine learning algorithm-based spam detection in social networks

被引:2
|
作者
Sumathi, M. [1 ]
Raja, S. P. [2 ]
机构
[1] SASTRA Deemed Univ, Sch Comp, Thanjavur, Tamil Nadu, India
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
关键词
Social network; Spam features; Spam detection; Machine learning algorithms; Accuracy; Precision; Recall; Voting classifier;
D O I
10.1007/s13278-023-01108-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many social media (SM) platforms have emerged as a result of the online social network's (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique's accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Machine learning algorithm-based spam detection in social networks
    M. Sumathi
    S. P. Raja
    [J]. Social Network Analysis and Mining, 13
  • [2] Automated Spam Detection Using Sandpiper Optimization Algorithm-Based Feature Selection with the Machine Learning Model
    Amutha, T.
    Geetha, S.
    [J]. IETE JOURNAL OF RESEARCH, 2024, 70 (02) : 1472 - 1479
  • [3] Machine Learning for the Detection of Spam in Twitter Networks
    Wang, Alex Hai
    [J]. E-BUSINESS AND TELECOMMUNICATIONS, 2012, 222 : 319 - 333
  • [4] Detection of Social Network Spam Based on Improved Extreme Learning Machine
    Zhang, Zhijie
    Hou, Rui
    Yang, Jin
    [J]. IEEE ACCESS, 2020, 8 : 112003 - 112014
  • [5] A Study of Spam Detection Algorithm on Social Media Networks
    Saini, Jacob Soman
    [J]. COMPUTATIONAL INTELLIGENCE, CYBER SECURITY AND COMPUTATIONAL MODELS, 2014, 246 : 195 - 202
  • [6] A Machine Learning Framework for Domain Generation Algorithm-Based Malware Detection
    Li, Yi
    Xiong, Kaiqi
    Chin, Tommy
    Hu, Chengbin
    [J]. IEEE ACCESS, 2019, 7 : 32765 - 32782
  • [7] Spam detection in online social networks by deep learning
    Ameen, Aso Khaleel
    Kaya, Buket
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP), 2018,
  • [8] Machine Learning-Based Detection of Spam Emails
    Bin Siddique, Zeeshan
    Khan, Mudassar Ali
    Din, Ikram Ud
    Almogren, Ahmad
    Mohiuddin, Irfan
    Nazir, Shah
    [J]. SCIENTIFIC PROGRAMMING, 2021, 2021
  • [9] Genetic algorithm-based community detection in large-scale social networks
    Behera, Ranjan Kumar
    Naik, Debadatta
    Rath, Santanu Kumar
    Dharavath, Ramesh
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 9649 - 9665
  • [10] Genetic algorithm-based community detection in large-scale social networks
    Ranjan Kumar Behera
    Debadatta Naik
    Santanu Kumar Rath
    Ramesh Dharavath
    [J]. Neural Computing and Applications, 2020, 32 : 9649 - 9665