Explainable machine learning models for Medicare fraud detection

被引:5
|
作者
Hancock, John T. [1 ]
Bauder, Richard A. [1 ]
Wang, Huanjing [2 ]
Khoshgoftaar, Taghi M. [1 ]
机构
[1] Florida Atlantic Univ, Coll Engn & Comp Sci, Boca Raton, FL 33004 USA
[2] Western Kentucky Univ, Ogden Coll Sci & Engn, Bowling Green, KY USA
关键词
Big Data; Class imbalance; Explainable machine learning models; Ensemble supervised feature selection; Medicare fraud detection;
D O I
10.1186/s40537-023-00821-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As a means of building explainable machine learning models for Big Data, we apply a novel ensemble supervised feature selection technique. The technique is applied to publicly available insurance claims data from the United States public health insurance program, Medicare. We approach Medicare insurance fraud detection as a supervised machine learning task of anomaly detection through the classification of highly imbalanced Big Data. Our objectives for feature selection are to increase efficiency in model training, and to develop more explainable machine learning models for fraud detection. Using two Big Data datasets derived from two different sources of insurance claims data, we demonstrate how our feature selection technique reduces the dimensionality of the datasets by approximately 87.5% without compromising performance. Moreover, the reduction in dimensionality results in machine learning models that are easier to explain, and less prone to overfitting. Therefore, our primary contribution of the exposition of our novel feature selection technique leads to a further contribution to the application domain of automated Medicare insurance fraud detection. We utilize our feature selection technique to provide an explanation of our fraud detection models in terms of the definitions of the selected features. The ensemble supervised feature selection technique we present is flexible in that any collection of machine learning algorithms that maintain a list of feature importance values may be used. Therefore, researchers may easily employ variations of the technique we present.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Effective depression detection and interpretation: Integrating machine learning, deep learning, language models, and explainable AI
    Al Masud, Gazi Hasan
    Shanto, Rejaul Islam
    Sakin, Ishmam
    Kabir, Muhammad Rafsan
    ARRAY, 2025, 25
  • [32] Towards Explainable Occupational Fraud Detection
    Tritscher, Julian
    Schloer, Daniel
    Gwinner, Fabian
    Krause, Anna
    Hotho, Andreas
    MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 1753 : 79 - 96
  • [33] xFraud: Explainable Fraud Transaction Detection
    Rao, Susie Xi
    Zhang, Shuai
    Han, Zhichao
    Zhang, Zitao
    Min, Wei
    Chen, Zhiyao
    Shan, Yinan
    Zhao, Yang
    Zhang, Ce
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 15 (03): : 427 - 436
  • [34] Explainable Machine Learning for Lung Cancer Screening Models
    Kobylinska, Katarzyna
    Orlowski, Tadeusz
    Adamek, Mariusz
    Biecek, Przemyslaw
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [35] Editorial: Interpretable and explainable machine learning models in oncology
    Hrinivich, William Thomas
    Wang, Tonghe
    Wang, Chunhao
    FRONTIERS IN ONCOLOGY, 2023, 13
  • [36] Explainable Machine Learning Models Assessing Lending Risk
    Nassiri, Khalid
    Akhloufi, Moulay A.
    NAVIGATING THE TECHNOLOGICAL TIDE: THE EVOLUTION AND CHALLENGES OF BUSINESS MODEL INNOVATION, VOL 3, ICBT 2024, 2024, 1082 : 519 - 529
  • [37] Explainable machine learning models to analyse maternal health
    Patel, Shivshanker Singh
    DATA & KNOWLEDGE ENGINEERING, 2023, 146
  • [38] An ensemble framework for explainable geospatial machine learning models
    Liu, Lingbo
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 132
  • [39] Explainable Machine Learning for Improving Logistic Regression Models
    Yang, Yimin
    Wu, Min
    2021 IEEE 19TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2021,
  • [40] The coming of age of interpretable and explainable machine learning models
    Lisboa, P. J. G.
    Saralajew, S.
    Vellido, A.
    Fernandez-Domenech, R.
    Villmann, T.
    NEUROCOMPUTING, 2023, 535 : 25 - 39