An engine to simulate insurance fraud network data

被引:0
|
作者
Bavo D. C. Campo [1 ]
Katrien Antonio [1 ]
机构
[1] KU Leuven,Faculty of Economics and Business
[2] University of Amsterdam,Faculty of Economics and Business
[3] KU Leuven,LRisk, Leuven Research Center on Insurance and Financial Risk Analysis
[4] KU Leuven,LStat, Leuven Statistics Research Center
关键词
Social network data; Simulation machine; Insurance fraud detection; Class imbalance; Unlabeled data;
D O I
10.1007/s13385-024-00399-z
中图分类号
学科分类号
摘要
Traditionally, the detection of fraudulent insurance claims relies on business rules and expert judgement which makes it a time-consuming and expensive process (Óskarsdóttir et al. in Risk Anal 42(8):1872–1890, 2022). Consequently, researchers have been examining ways to develop efficient and accurate analytic strategies to flag suspicious claims. Feeding learning methods with features engineered from the social network of parties involved in a claim is a particularly promising strategy (see for example Óskarsdóttir et al. in Risk Anal 42(8):1872–1890, 2022; Van Vlasselaer et al. in Manag Sci 63(9):3090–3110, 2016; Tumminello et al. in J Risk Insur 90(2), 381–419, 2023). When developing a fraud detection model, however, we are confronted with several challenges. The uncommon nature of fraud, for example, creates a high class imbalance which complicates the development of well performing analytic classification models. In addition, only a small number of claims are investigated and get a label, which results in a large corpus of unlabeled data. Yet another challenge is the lack of publicly available data. This hinders not only the development of new methods, but also the validation of existing techniques. We therefore design a simulation machine that is engineered to create synthetic data with a network structure and available covariates similar to the real life insurance fraud data set analyzed in Óskarsdóttir et al. (Risk Anal 42(8):1872–1890, 2022). Further, the user has control over several data-generating mechanisms. We can specify the total number of policyholders and parties, the desired level of imbalance and the (effect size of the) features in the fraud generating model. As such, the simulation engine enables researchers and practitioners to examine several methodological challenges as well as to test their (development strategy of) insurance fraud detection models in a range of different settings. Moreover, large synthetic data sets can be generated to evaluate the predictive performance of (advanced) machine learning techniques.
引用
收藏
页码:255 / 295
页数:40
相关论文
共 50 条
  • [21] Dynamic Heterogeneous Network Representation Learning for Fraud Detection in Auto Insurance
    Pan, Yijun
    Liang, Bian
    Zhang, Long
    Na, Chongning
    Computer Engineering and Applications, 60 (24): : 322 - 330
  • [22] Use of Data Mining Techniques for Data Balancing and Fraud Detection in Automobile Insurance Claims
    Padhi, Slokashree
    Panigrahi, Suvasini
    INTELLIGENT COMPUTING AND COMMUNICATION, ICICC 2019, 2020, 1034 : 221 - 230
  • [23] Vehicle Insurance Fraud Detection Based on Hybrid Approach for Data Augmentation
    Rubaidi, Zainab Saad
    Ammar, Boulbaba Ben
    Aouicha, Mohamed Ben
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2023, 18 (05): : 135 - 146
  • [24] Use of Data Mining Techniques to Detect Medical Fraud in Health Insurance
    Lin, Kuo-Chung
    Yeh, Ching-Long
    INTERNATIONAL JOURNAL OF ENGINEERING AND TECHNOLOGY INNOVATION, 2012, 2 (02) : 126 - 137
  • [25] An Effective Data Sampling Procedure for Imbalanced Data Learning on Health Insurance Fraud Detection
    Kotekani S.S.
    Velchamy I.
    Journal of Computing and Information Technology, 2020, 28 (04) : 269 - 285
  • [26] Usage of R Programming in Data Analytics with Implications on Insurance Fraud Detection
    Sheshasaayee, Ananthi
    Thomas, Surya Susan
    INTERNATIONAL CONFERENCE ON INTELLIGENT DATA COMMUNICATION TECHNOLOGIES AND INTERNET OF THINGS, ICICI 2018, 2019, 26 : 416 - 421
  • [27] An expert system for detecting automobile insurance fraud using social network analysis
    Subelj, Lovro
    Furlan, Stefan
    Bajec, Marko
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (01) : 1039 - 1052
  • [28] Confidentiality and health insurance fraud
    Farber, NJ
    Berger, MS
    Davis, EB
    Weiner, J
    Boyer, EG
    Ubel, PA
    ARCHIVES OF INTERNAL MEDICINE, 1997, 157 (05) : 501 - 504
  • [29] ON THE PROBLEMS OF COUNTERING INSURANCE FRAUD
    V. Okhrimenko, Igor
    Tsyganov, Alexander A.
    RUSSIAN JOURNAL OF CRIMINOLOGY, 2023, 17 (05): : 432 - 442
  • [30] Centralizing Insurance Fraud Investigation
    M. Martin Boyer
    The Geneva Papers on Risk and Insurance Theory, 2000, 25 : 159 - 178