Machine Learning Algorithms for Document Clustering and Fraud Detection

被引:0
|
作者
Yaram, Suresh [1 ]
机构
[1] CSC India Ltd, Hyderabad, Andhra Pradesh, India
关键词
machine learning; supervised learning; unsupervised learning; clustering; classification; decision tree; random forest; naive bayes; class variable or dependent variable; feature variable or independent variable; document-term matrix; term frequency; inverse document frequency; euclidean distance; information gain; entropy; confusion matrix; accuracy; precision; recall;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Machine Learning plays very important role in processing of large amounts of structured and unstructured data. A set of algorithms can be used to get meaningful insights into the data that are helpful in making effective business decisions. Document clustering is one of the popular machine learning technique used to group unstructured data (text documents) based on its content and further analyze the data to understand the patterns in it. The unstructured data gets transformed into semi-structured data and structured data in stages by using text mining and clustering (k-means) techniques. Classification is another machine learning technique that can be implemented for use cases like "fraud detection and cross-sell & up-sell opportunity identification" in banking, financial services and insurance industry. This paper focuses on the implementation of both document clustering algorithm and a set of classification algorithms (Decision Tree, Random Forest and Naive Bayes), along with appropriate industry use cases. Also, the performance of three classification algorithms will be compared by calculation of "Confusion Matrix" which in turn helps us to calculate performance measures such as, "accuracy", "precision", and "recall".
引用
收藏
页码:103 / 108
页数:6
相关论文
共 50 条
  • [1] MACHINE LEARNING ALGORITHMS FOR AUTO INSURANCE FRAUD DETECTION
    Badal Valero, Elena
    Sanjuan Diaz, Andres
    Segura Gisbert, Jorge
    [J]. ANALES DEL INSTITUTO DE ACTUARIOS ESPANOLES, 2020, (26): : 23 - 46
  • [2] Prediction of Insurance Fraud Detection using Machine Learning Algorithms
    Rukhsar, Laiqa
    Bangyal, Waqas Haider
    Nisar, Kashif
    Nisar, Sana
    [J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2022, 41 (01) : 33 - 40
  • [3] Credit Card Fraud Detection using Machine Learning Algorithms
    Dornadula, Vaishnavi Nath
    Geetha, S.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 631 - 641
  • [4] Fraud Detection and Prevention Using Machine Learning Algorithms: A Review
    Priya, G. Jaculine
    Saradha, S.
    [J]. 2021 7TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENERGY SYSTEMS (ICEES), 2021, : 564 - 568
  • [5] CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING ALGORITHMS
    Tyagi, Rishabh
    Ranjan, Ravi
    Priya, S.
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 334 - 341
  • [6] Credit card fraud detection using machine learning algorithms
    de Souza, Daniel H. M.
    Bordin Jr, Claudio J.
    [J]. REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2023, 15 (01): : 1 - 11
  • [7] Text Detection in Document Images by Machine Learning Algorithms
    Zelenika, Darko
    Povh, Janez
    Zenko, Bernard
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 169 - 179
  • [8] Comparative Evaluation of Machine Learning Algorithms for Credit Card Fraud Detection
    Singh, Kiran Jot
    Thakur, Khushal
    Kapoor, Divneet Singh
    Sharma, Anshul
    Bajpai, Sakshi
    Sirawag, Neeraj
    Mehta, Riya
    Chaudhary, Chitransh
    Singh, Utkarsh
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 69 - 78
  • [9] The Performance Analysis of Machine Learning Algorithms for Credit Card Fraud Detection
    Khan, Muhammad Zohaib
    Shaikh, Sarmad Ahmed
    Shaikh, Muneer Ahmed
    Khatri, Kamlesh Kumar
    Rauf, Mahira Abdul
    Kalhoro, Ayesha
    Adnan, Muhammad
    [J]. INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (03) : 82 - 98
  • [10] Supervised Machine Learning Algorithms for Credit Card Fraud Detection: A Comparison
    Khatri, Samidha
    Arora, Aishwarya
    Agrawal, Arun Prakash
    [J]. PROCEEDINGS OF THE CONFLUENCE 2020: 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING, 2020, : 680 - 683