Machine learning algorithm-based spam detection in social networks

被引:2
|
作者
Sumathi, M. [1 ]
Raja, S. P. [2 ]
机构
[1] SASTRA Deemed Univ, Sch Comp, Thanjavur, Tamil Nadu, India
[2] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
关键词
Social network; Spam features; Spam detection; Machine learning algorithms; Accuracy; Precision; Recall; Voting classifier;
D O I
10.1007/s13278-023-01108-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many social media (SM) platforms have emerged as a result of the online social network's (OSN) rapid expansion. SM has become important in day-to-day life, and spammers have turned their attention to SM. Spam detection (SD) is done in two different ways, such as machine learning (ML) and expert-based detection. The expert-based detection technique's accuracy depends on expert knowledge, and it takes huge time to detect the spams. Thus, ML-based spam detection is preferred in OSN. Spam identification on social networks is a difficult operation involving a variety of factors, and spam and ham have resulted in an imbalanced data distribution, which gives flexibility to spammers for corrupting our devices. SD based on ML algorithms like logistic regression (LR), K-nearest neighbor (KNN), decision trees (DT), random forest (RF), support vector machine (SVM) and eXtreme gradient boosting (XGB), voting classifier (VC) and extra tree classifier (ETC) are used to design the address balance and to attain high assessment accuracy in an imbalanced datasets. ETC method minimizes the bias through the original sampling process. For reducing processing complexity, the ETC method uses a smaller size constant factor instead of a larger one. Thus, the ETC technique produces better data splitting than DT and RF techniques. Text is vectorized by vectorizers, and all the relative results are stored in it. The VC is an ensemble method that integrates predictions form several methods to forecast an output class depending on which predictions have the highest probability. The multi-class results are aggregated and forecast for the majority voted class. The experimental result shows that, as compared to KN, NB, ETC, RF, SVC, LR, XGB and DT, the proposed VC provides a higher classification accuracy rate of 97.96%, 97.56% of precision, 89.95% of recall and 91.96% of F1-measures. Similarly, ETC provides 97.77% accuracy, 98.31% of precision, 84.78% of recall and 91.05% of F1-measures. Compared to conventional ML algorithms, VC and ETC provide higher accuracy, precision, recall and F1-measures. Thus, ETC and VC are preferable for spam detection. The website has been designed to detect messages as spam or not.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A Genetic Algorithm-Based Heuristic for Rumour Minimization in Social Networks
    Rajak, Vivek Kumar
    Kare, Anjeneya Swami
    [J]. DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2024, 2024, 14501 : 249 - 265
  • [32] A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data
    Zhao, Chensu
    Xin, Yang
    Li, Xuefeng
    Yang, Yixian
    Chen, Yuling
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (03):
  • [33] Deep Learning Empowered Cybersecurity Spam Bot Detection for Online Social Networks
    Al Duhayyim, Mesfer
    Alshahrani, Haya Mesfer
    Al-Wesabi, Fahd N.
    Alamgeer, Mohammed
    Hilal, Anwer Mustafa
    Rizwanullah, Mohammed
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (03): : 6257 - 6270
  • [34] Multistage and Elastic Spam Detection in Mobile Social Networks through Deep Learning
    Feng, Bo
    Fu, Qiang
    Dong, Mianxiong
    Guo, Dong
    Li, Qiang
    [J]. IEEE NETWORK, 2018, 32 (04): : 15 - 21
  • [35] A hybrid spam detection framework for social networks
    Citlak, Oguzhan
    Dorterler, Murat
    Dogru, Ibrahim Alper
    [J]. JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2023, 26 (02): : 823 - 837
  • [36] Improving Spam Detection in Online Social Networks
    Gupta, Arushi
    Kaushal, Rishabh
    [J]. 2015 INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING AND INFORMATION PROCESSING (CCIP), 2015,
  • [37] Analysis and Detection of Spam Accounts in Social Networks
    Liu, Chen
    Wang, Genying
    [J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 2526 - 2530
  • [38] A genetic algorithm-based, hybrid machine learning approach to model selection
    Bies, RR
    Muldoon, MF
    Pollock, BG
    Manuck, S
    Smith, G
    Sale, ME
    [J]. JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2006, 33 (02) : 195 - 221
  • [39] An Extreme Learning Machine-Based Community Detection Algorithm in Complex Networks
    Wang, Feifan
    Zhang, Baihai
    Chai, Senchun
    Xia, Yuanqing
    [J]. COMPLEXITY, 2018,
  • [40] A Genetic Algorithm-Based, Hybrid Machine Learning Approach to Model Selection
    Robert R. Bies
    Matthew F. Muldoon
    Bruce G. Pollock
    Steven Manuck
    Gwenn Smith
    Mark E. Sale
    [J]. Journal of Pharmacokinetics and Pharmacodynamics, 2006, 33 : 195 - 221