An engine to simulate insurance fraud network data

被引:0
|
作者
Bavo D. C. Campo [1 ]
Katrien Antonio [1 ]
机构
[1] KU Leuven,Faculty of Economics and Business
[2] University of Amsterdam,Faculty of Economics and Business
[3] KU Leuven,LRisk, Leuven Research Center on Insurance and Financial Risk Analysis
[4] KU Leuven,LStat, Leuven Statistics Research Center
关键词
Social network data; Simulation machine; Insurance fraud detection; Class imbalance; Unlabeled data;
D O I
10.1007/s13385-024-00399-z
中图分类号
学科分类号
摘要
Traditionally, the detection of fraudulent insurance claims relies on business rules and expert judgement which makes it a time-consuming and expensive process (Óskarsdóttir et al. in Risk Anal 42(8):1872–1890, 2022). Consequently, researchers have been examining ways to develop efficient and accurate analytic strategies to flag suspicious claims. Feeding learning methods with features engineered from the social network of parties involved in a claim is a particularly promising strategy (see for example Óskarsdóttir et al. in Risk Anal 42(8):1872–1890, 2022; Van Vlasselaer et al. in Manag Sci 63(9):3090–3110, 2016; Tumminello et al. in J Risk Insur 90(2), 381–419, 2023). When developing a fraud detection model, however, we are confronted with several challenges. The uncommon nature of fraud, for example, creates a high class imbalance which complicates the development of well performing analytic classification models. In addition, only a small number of claims are investigated and get a label, which results in a large corpus of unlabeled data. Yet another challenge is the lack of publicly available data. This hinders not only the development of new methods, but also the validation of existing techniques. We therefore design a simulation machine that is engineered to create synthetic data with a network structure and available covariates similar to the real life insurance fraud data set analyzed in Óskarsdóttir et al. (Risk Anal 42(8):1872–1890, 2022). Further, the user has control over several data-generating mechanisms. We can specify the total number of policyholders and parties, the desired level of imbalance and the (effect size of the) features in the fraud generating model. As such, the simulation engine enables researchers and practitioners to examine several methodological challenges as well as to test their (development strategy of) insurance fraud detection models in a range of different settings. Moreover, large synthetic data sets can be generated to evaluate the predictive performance of (advanced) machine learning techniques.
引用
收藏
页码:255 / 295
页数:40
相关论文
共 50 条
  • [1] Uncovering Insurance Fraud Conspiracy with Network Learning
    Liang, Chen
    Liu, Ziqi
    Liu, Bin
    Zhou, Jun
    Li, Xiaolong
    Yang, Shuang
    Qi, Yuan
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1181 - 1184
  • [2] Insurance fraud detection: A statistically validated network approach
    Tumminello, Michele
    Consiglio, Andrea
    Vassallo, Pietro
    Cesari, Riccardo
    Farabullini, Fabio
    JOURNAL OF RISK AND INSURANCE, 2023, 90 (02) : 381 - 419
  • [3] Social Network Analytics for Supervised Fraud Detection in Insurance
    Oskarsdottir, Maria
    Ahmed, Waqas
    Antonio, Katrien
    Baesens, Bart
    Dendievel, Remi
    Donas, Tom
    Reynkens, Tom
    RISK ANALYSIS, 2022, 42 (08) : 1872 - 1890
  • [4] Insurance fraud
    Derrig, RA
    JOURNAL OF RISK AND INSURANCE, 2002, 69 (03) : 271 - 287
  • [5] Data Exchange Platform to fight Insurance Fraud on Blockchain
    Nath, Indranil
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 821 - 825
  • [6] Data Sharing for Fraud Detection in Insurance: Challenges and Possibilities
    Soilen-Knutsen, Carl Christophe Louis
    Tessem, Bjornar
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 93 - 99
  • [7] INSURANCE FRAUD
    CLARKE, M
    BRITISH JOURNAL OF CRIMINOLOGY, 1989, 29 (01): : 1 - 20
  • [8] Big Data Science for Predicting Insurance Claims Fraud
    Kenyon, David
    Eloff, J. H. P.
    PROCEEDINGS OF THE 2017 INFORMATION SECURITY FOR SOUTH AFRICA (ISSA) CONFERENCE, 2017, : 40 - 47
  • [9] FRAUD TO INSURANCE
    Fernandez Arroyo, Laude Jose
    Quintero, Fernando Nino
    Carera Jaramaillo, Jorge Eduardo
    REVISTA CRIMINALIDAD, 2005, 48 : 349 - 357
  • [10] Data misrepresentation detection for insurance underwriting fraud prevention
    Vandervorst, Felix
    Verbeke, Wouter
    Verdonck, Tim
    DECISION SUPPORT SYSTEMS, 2022, 159