Comparisons of machine learning techniques for detecting fraudulent criminal identities

被引:0
|
作者
Kazemian, Hassan [1 ,2 ]
Shrestha, Subeksha [1 ,2 ]
机构
[1] London Metropolitan Univ, Intelligent Syst Res Ctr, Sch Comp & Digital Medial, London, England
[2] London Metropolitan Univ, 166-220 Holloway Rd, London N7 8DB, England
基金
欧盟地平线“2020”;
关键词
Identity resolution; Policing dataset; TensorFlow; Support vector machine; K-nearest neighbour; Naive Bayes; CLASSIFICATION;
D O I
10.1016/j.eswa.2023.120591
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on applications of various machine learning techniques on an anonymized policing dataset used in EU SPIRIT Horizon 2020 project to identify fraudulent identities and help Law Enforcement Agencies (LEAs) in their investigation in finding potential criminals and identity resolution. Lack of qualitative data and appropriate methodology to carry out research on criminal fraudulent identities is a common reason for fewer research in this area. Additionally, it is a very sensitive data to work with and minor inaccuracy in prediction of result causes massive impact in the society as genuine people could be questioned whereas criminals could be sent free. Both of these issues are addressed in this paper by application of 39 million records from policing dataset and working towards higher accuracy while building the model. Various machine learning approaches are applied to train the dataset to make predictions and the research focus on being able to predict the 5 suspected fraudulent identities out of 39 million records in the policing dataset. One of the applied machine learning techniques include TensorFlow along with Keras model which has seldomly been applied by researchers in detection of criminal data. To compare the results and test accuracy of TensorFlow model, other machine learning techniques such as Support Vector Machine, Naive Bayes and K-nearest Neighbours are also applied to have a comparative study on the obtained outcomes from each model. The goal of this research is to find fraudulent IDs amongst all the anonymized IDs in the criminal dataset using TensorFlow and three other machine learning models and select the most optimal model out of them. Since the model is comparing two names so string-matching techniques such as Levenshtein edit distance, Hamming Distance, Jaro-Winkler and Soundex were applied to select an effective approach first before building the model and analysing the results. TensorFlow model demonstrated highest accuracy with relatively least execution time and the only model to successfully predict all the 5 suspects from the policing dataset.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Comparisons of machine learning techniques for detecting malicious webpages
    Kazemian, H. B.
    Ahmed, S.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) : 1166 - 1177
  • [2] A machine learning approach to detecting fraudulent job types
    Marcel Naudé
    Kolawole John Adebayo
    Rohan Nanda
    [J]. AI & SOCIETY, 2023, 38 : 1013 - 1024
  • [3] A machine learning approach to detecting fraudulent job types
    Naude, Marcel
    Adebayo, Kolawole John
    Nanda, Rohan
    [J]. AI & SOCIETY, 2023, 38 (02) : 1013 - 1024
  • [4] Predicting fraudulent financial statements with machine learning techniques
    Kotsiantis, Sotiris
    Koumanakos, Euaggelos
    Tzelepis, Dimitris
    Tampakas, Vasilis
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 538 - 542
  • [5] Detecting Fraudulent Bookings of Online Travel Agencies with Unsupervised Machine Learning
    Mensah, Caleb
    Klein, Jan
    Bhulai, Sandjai
    Hoogendoorn, Mark
    van der Mei, Rob
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE, 2019, 11606 : 334 - 346
  • [6] Automatically detecting deceptive criminal identities
    Wang, G
    Chen, HC
    Atabakhsh, H
    [J]. COMMUNICATIONS OF THE ACM, 2004, 47 (03) : 70 - 76
  • [7] Machine learning techniques in detecting of pulmonary embolisms
    Myers, Mark H.
    Beliaev, Igor
    Lin, King-P
    [J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 385 - +
  • [8] Detecting Malware with Classification Machine Learning Techniques
    Yusof, Mohd Azahari Mohd
    Abdullah, Zubaile
    Ali, Firkhan Ali Hamid
    Sukri, Khairul Amin Mohamad
    Hussain, Hanizan Shaker
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 167 - 172
  • [9] EFFICIENT FEATURE AWARE MACHINE LEARNING MODEL FOR DETECTING FRAUDULENT TRANSACTION IN STREAMING ENVIRONMENT
    Shahapurkar, Arati
    Rodd, Sunil F.
    [J]. INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2022, 14 (03): : 3 - 14
  • [10] Fraudulent Transaction Detection in Credit Card by Applying Ensemble Machine Learning techniques
    Prusti, Debachudamani
    Rath, Santanu Kumar
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,