Comparisons of machine learning techniques for detecting fraudulent criminal identities

被引：0

作者：

Kazemian, Hassan ^{[1
,2
]}

Shrestha, Subeksha ^{[1
,2
]}

机构：

[1] London Metropolitan Univ, Intelligent Syst Res Ctr, Sch Comp & Digital Medial, London, England

[2] London Metropolitan Univ, 166-220 Holloway Rd, London N7 8DB, England

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2023年 / 229卷

基金：

欧盟地平线“2020”;

关键词：

Identity resolution; Policing dataset; TensorFlow; Support vector machine; K-nearest neighbour; Naive Bayes; CLASSIFICATION;

D O I：

10.1016/j.eswa.2023.120591

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on applications of various machine learning techniques on an anonymized policing dataset used in EU SPIRIT Horizon 2020 project to identify fraudulent identities and help Law Enforcement Agencies (LEAs) in their investigation in finding potential criminals and identity resolution. Lack of qualitative data and appropriate methodology to carry out research on criminal fraudulent identities is a common reason for fewer research in this area. Additionally, it is a very sensitive data to work with and minor inaccuracy in prediction of result causes massive impact in the society as genuine people could be questioned whereas criminals could be sent free. Both of these issues are addressed in this paper by application of 39 million records from policing dataset and working towards higher accuracy while building the model. Various machine learning approaches are applied to train the dataset to make predictions and the research focus on being able to predict the 5 suspected fraudulent identities out of 39 million records in the policing dataset. One of the applied machine learning techniques include TensorFlow along with Keras model which has seldomly been applied by researchers in detection of criminal data. To compare the results and test accuracy of TensorFlow model, other machine learning techniques such as Support Vector Machine, Naive Bayes and K-nearest Neighbours are also applied to have a comparative study on the obtained outcomes from each model. The goal of this research is to find fraudulent IDs amongst all the anonymized IDs in the criminal dataset using TensorFlow and three other machine learning models and select the most optimal model out of them. Since the model is comparing two names so string-matching techniques such as Levenshtein edit distance, Hamming Distance, Jaro-Winkler and Soundex were applied to select an effective approach first before building the model and analysing the results. TensorFlow model demonstrated highest accuracy with relatively least execution time and the only model to successfully predict all the 5 suspects from the policing dataset.

引用

页数：13

共 50 条

[1] Comparisons of machine learning techniques for detecting malicious webpages
Kazemian, H. B.
Ahmed, S.
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) : 1166 - 1177
[2] A machine learning approach to detecting fraudulent job types
Marcel Naudé
Kolawole John Adebayo
Rohan Nanda
[J]. AI & SOCIETY, 2023, 38 : 1013 - 1024
[3] A machine learning approach to detecting fraudulent job types
Naude, Marcel
Adebayo, Kolawole John
Nanda, Rohan
[J]. AI & SOCIETY, 2023, 38 (02) : 1013 - 1024
[4] Predicting fraudulent financial statements with machine learning techniques
Kotsiantis, Sotiris
Koumanakos, Euaggelos
Tzelepis, Dimitris
Tampakas, Vasilis
[J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 538 - 542
[5] Detecting Fraudulent Bookings of Online Travel Agencies with Unsupervised Machine Learning
Mensah, Caleb
Klein, Jan
Bhulai, Sandjai
Hoogendoorn, Mark
van der Mei, Rob
[J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE, 2019, 11606 : 334 - 346
[6] Automatically detecting deceptive criminal identities
Wang, G
Chen, HC
Atabakhsh, H
[J]. COMMUNICATIONS OF THE ACM, 2004, 47 (03) : 70 - 76
[7] Machine learning techniques in detecting of pulmonary embolisms
Myers, Mark H.
Beliaev, Igor
Lin, King-P
[J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 385 - +
[8] Detecting Malware with Classification Machine Learning Techniques
Yusof, Mohd Azahari Mohd
Abdullah, Zubaile
Ali, Firkhan Ali Hamid
Sukri, Khairul Amin Mohamad
Hussain, Hanizan Shaker
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 167 - 172
[9] EFFICIENT FEATURE AWARE MACHINE LEARNING MODEL FOR DETECTING FRAUDULENT TRANSACTION IN STREAMING ENVIRONMENT
Shahapurkar, Arati
Rodd, Sunil F.
[J]. INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2022, 14 (03): : 3 - 14
[10] Fraudulent Transaction Detection in Credit Card by Applying Ensemble Machine Learning techniques
Prusti, Debachudamani
Rath, Santanu Kumar
[J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,

← 1 2 3 4 5 →