Classifying Swahili Smishing Attacks for Mobile Money Users: A Machine-Learning Approach

被引:6
|
作者
Mambina, Iddi S. [1 ]
Ndibwile, Jema D. [2 ]
Michael, Kisangiri F. [1 ]
机构
[1] Nelson Mandela Inst Sci & Technol, Sch Computat & Commun Sci & Engn, Arusha 447, Tanzania
[2] Carnegie Mellon Univ Africa, Coll Engn, Kigali, Rwanda
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Natural language processing; mobile money; machine-learning; SMS; Sub-Saharan Africa; social engineering; smishing; SECURITY MODEL; DETECTOR; MESSAGES;
D O I
10.1109/ACCESS.2022.3196464
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the massive adoption of mobile money in Sub-Saharan countries, the global transaction value of mobile money exceeded $2 billion in 2021. Projections show transaction values will exceed $3 billion by the end of 2022, and Sub-Saharan Africa contributes half of the daily transactions. SMS (Short Message Service) phishing cost corporations and individuals millions of dollars annually. Spammers use Smishing (SMS Phishing) messages to trick a mobile money user into sending electronic cash to an unintended mobile wallet. Though Smishing is an incarnation of phishing, they differ in the information available and attack strategy. As a result, detecting Smishing becomes difficult. Numerous models and techniques to detect Smishing attacks have been introduced for high-resource languages, yet few target low-resource languages such as Swahili. This study proposes a machine-learning based model to classify Swahili Smishing text messages targeting mobile money users. Experimental results show a hybrid model of Extratree classifier feature selection and Random Forest using TFIDF (Term Frequency Inverse Document Frequency) vectorization yields the best model with an accuracy score of 99.86%. Results are measured against a baseline Multinomial Naive-Bayes model. In addition, comparison with a set of other classic classifiers is also done. The model returns the lowest false positive and false negative of 2 and 4, respectively, with a Log-Loss of 0.04. A Swahili dataset with 32259 messages is used for performance evaluation.
引用
收藏
页码:83061 / 83074
页数:14
相关论文
共 50 条
  • [1] A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks
    Rasthofer, Siegfried
    Arzt, Steven
    Bodden, Eric
    [J]. 21ST ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2014), 2014,
  • [2] Classifying the traffic state of urban expressways: A machine-learning approach
    Cheng, Zeyang
    Wang, Wei
    Lu, Jian
    Xing, Xue
    [J]. TRANSPORTATION RESEARCH PART A-POLICY AND PRACTICE, 2020, 137 : 411 - 428
  • [3] A MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING-BASED SMISHING DETECTION MODEL FOR MOBILE MONEY TRANSACTIONS
    Zimba, Aaron
    Phiri, Katongo O.
    Kashale, Chimanga
    Phiri, Mwiza Norina
    [J]. INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2024, 16 (03): : 69 - 80
  • [4] Predicting the Location of Mobile Users: A Machine Learning Approach
    Anagnostopoulos, Theodoros
    Anagnostopoulos, Christos
    Hadjiefthymiades, Stathes
    Kyriakakos, Miltos
    Kalousis, Alexandros
    [J]. INTERNATIONAL CONFERENCE ON PERVASIVE SERVICES (ICPS 2009), 2009, : 65 - 72
  • [5] A machine-learning approach for classifying defects on tree trunks using terrestrial LiDAR
    Van-Tho Nguyen
    Constant, Thiery
    Kerautret, Bertrand
    Debled-Rennesson, Isabelle
    Colin, Francis
    [J]. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2020, 171
  • [6] Predictive utility of symptom measures in classifying anxiety and depression: A machine-learning approach
    Liu, Kevin
    Droncheff, Brian
    Warren, Stacie L.
    [J]. PSYCHIATRY RESEARCH, 2022, 312
  • [7] Machine-Learning Techniques for Detecting Attacks in SDN
    Elsayed, Mahmoud Said
    Nhien-An Le-Khac
    Dev, Soumyabrata
    Jurcut, Anca Delia
    [J]. PROCEEDINGS OF 2019 IEEE 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2019), 2019, : 277 - 281
  • [8] Classifying Social Media Users with Machine Learning
    Li, Gang
    Zhou, Huayang
    Mao, Jin
    Chen, Sijing
    [J]. Data Analysis and Knowledge Discovery, 2019, 3 (08): : 1 - 9
  • [9] SmiDCA: An Anti-Smishing Model with Machine Learning Approach
    Sonowal, Gunikhan
    Kuppusamy, K. S.
    [J]. COMPUTER JOURNAL, 2018, 61 (08): : 1143 - 1157
  • [10] A machine-learning procedure to detect network attacks
    Coppes, Davide
    Cermelli, Paolo
    [J]. JOURNAL OF COMPLEX NETWORKS, 2023, 11 (03)