Detecting Fraudulent Claims in Automobile Insurance Policies by Data Mining Techniques

被引:0
|
作者
Simmachan, Teerawat [1 ,2 ]
Manopa, Weerapong [1 ]
Neamhom, Pailin [1 ]
Poothong, Achiraya [1 ]
Phaphan, Wikanda [3 ,4 ]
机构
[1] Thammasat Univ, Fac Sci & Technol, Dept Math & Stat, Pathum Thani, Thailand
[2] Thammasat Univ Res Unit Data Learning, Thammasat Univ, Fac Sci & Technol, Pathum Thani, Thailand
[3] King Mongkuts Univ Technol North Bangkok, Fac Appl Sci, Dept Appl Stat, Bangkok, Thailand
[4] Res Grp Stat Learning & Inference, KMUTNB, Bangkok, Thailand
来源
THAILAND STATISTICIAN | 2023年 / 21卷 / 03期
关键词
Na ϊve Bayes; random forest; adaptive boosting; logistic regression; variable selection;
D O I
暂无
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The insurance industry is a fast-growing industry and handles substantial amounts of data. Fraud-ulent claims are the main problem in the industry. Auto insurance fraud is one of the most prominent types of insurance fraud. Numerous fraudulent claims affect not only the insurance company but also the sincere policyholders because of the increase in premium amounts. Typically, a fraud report is unbalanced data. Overlooking this generally leads to weak classifiers for predicting the minority class (fraudulent claim). Therefore, the fraud detection is a challenging problem. Traditional approaches are difficult to handle and inefficient. Data mining has recently offered significant contributions to in-surance analysis. To overcome this, data mining techniques are used to predict fraudulent claims. The aims of this research are to develop, firstly, what types of features should be used to build the predic-tive model; and second, a statistical learning strategy to classify whether a fraud report is fraudulent or not. To discover important sets of features, logistic regression (parametric method) and random forest (non-parametric method) are considered as tools of variable selection algorithms. This process is done by cross-validation to reduce uncertainty until two sets of important features are obtained. Four algorithms including logistic regression, random forest, Na?ve Bayes, and adaptive boosting are employed as classifiers. A confusion matrix is used to evaluate the algorithm's performance. The results suggest that a set of important features obtained from the non-parametric method provides better performance than the parametric method. The random forest is considered as the best algo-rithms to identify fraudulent claims with the highest sensitivity (99.19%) and the positive predictive value (93.62%). This work would help in a screening process to investigate claims, thus minimizing human resources and monetary losses in the insurance industry.
引用
收藏
页码:552 / 568
页数:17
相关论文
共 50 条
  • [1] Strategies for detecting fraudulent claims in the automobile insurance industry
    Viaene, Stijn
    Ayuso, Mercedes
    Guillen, Montserrat
    Van Gheel, Dirk
    Dedene, Guido
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2007, 176 (01) : 565 - 583
  • [2] Predicting Fraudulent Claims in Automobile Insurance
    Kowshalya, G.
    Nandhini, M.
    [J]. PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 1338 - 1343
  • [3] Use of Data Mining Techniques for Data Balancing and Fraud Detection in Automobile Insurance Claims
    Padhi, Slokashree
    Panigrahi, Suvasini
    [J]. INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 221 - 230
  • [4] A Hybrid Approach for Detecting Fraudulent Medical Insurance Claims
    Sun, Chenfei
    Shi, Yuliang
    Li, Qingzhong
    Cui, Lizhen
    Yu, Han
    Miao, Chunyan
    [J]. AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1287 - 1288
  • [5] Application of Reinforcement Learning in Detecting Fraudulent Insurance Claims
    Choi, Jung-Moon
    Kim, Ji-Hyeok
    Kim, Sung-Jun
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (09): : 125 - 131
  • [6] Deep Learning Method for Detecting Fraudulent Motor Insurance Claims Using Unbalanced Data
    Muranda, Charles
    Ali, Ahmed
    Shongwe, Thikozani
    [J]. 2021 62ND INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT SCIENCE OF RIGA TECHNICAL UNIVERSITY (ITMS), 2021,
  • [7] Deductible contracts against fraudulent claims:: Evidence from automobile insurance
    Dionne, G
    Gagné, R
    [J]. REVIEW OF ECONOMICS AND STATISTICS, 2001, 83 (02) : 290 - 301
  • [8] EFFECTIVE STRATEGIES FOR DETECTING FRAUDULENT CLAIMS IN MOTOR THIRD PARTY LIABILITY INSURANCE
    Primorac, Zeljka
    [J]. ECONOMIC AND SOCIAL DEVELOPMENT (ESD), 2016, : 299 - +
  • [9] Framework for the Identification of Fraudulent Health Insurance Claims using Association Rule Mining
    Kareem, Saba
    Ahmad, Rohiza Binti
    Sarlan, Aliza Binit
    [J]. 2017 IEEE CONFERENCE ON BIG DATA AND ANALYTICS (ICBDA), 2017, : 99 - 104
  • [10] FRAUDULENT HEALTH-INSURANCE CLAIMS
    JORDAN, RD
    [J]. FORUM-AMERICAN BAR ASSOCIATION, 1977, 12 (03): : 799 - 807