Improved Dirichlet mixture model clustering algorithm for medical data anomaly detection

被引:0
|
作者
Wu, Lili [1 ,2 ]
Ali, Majid Khan Majahar [3 ]
Shan, Fam Pei [3 ]
Tian, Ying [4 ]
Tao, Li [3 ]
机构
[1] Xinzhou Teachers Univ, Dept Comp Sci, Xinzhou 034000, Peoples R China
[2] Univ Sains Malaysia USM, Sch Math Sci, George Town 11800, Malaysia
[3] USM, Sch Math Sci, George Town 11800, Malaysia
[4] Taiyuan Univ Technol, Dept Math, Taiyuan 030024, Peoples R China
关键词
over-diagnosis; anomaly expenses; anomaly detection; DPMM; CBLOF;
D O I
10.1504/IJBIC.2024.10064803
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to address the issue of identifying over-diagnosis and anomaly expenses in the healthcare service process, a local outlier mining clustering algorithm (ILOF-DPMM) is proposed by combining the clustering-based local outlier factor (CBLOF) algorithm with Dirichlet mixture model (DPMM). By extracting the patient's hospitalisation records from the medical record homepage, the influencing factors of hospitalisation costs for different disease types are classified, and the random forest method is used to reduce the feature dimension by disease type. The feature extraction and dimensionality reduction methods adopted by this algorithm effectively cluster medical insurance expense data. When calculating the LOF value of data, using a weighted calculation method based on the similarity of discrete and continuous features can more accurately detect abnormal data points in the data set, and has the ability to detect new data in real time, thus improving detection accuracy and efficiency.
引用
收藏
页码:11 / 21
页数:12
相关论文
共 50 条
  • [1] Clustering compositional data using Dirichlet mixture model
    Pal, Samyajoy
    Heumann, Christian
    PLOS ONE, 2022, 17 (05):
  • [2] An Improved Clustream Clustering Algorithm for Anomaly Detection in Electric Power Big Data
    Wang, Yanming
    Engineering Intelligent Systems, 2022, 30 (03): : 185 - 193
  • [3] A Dirichlet process mixture model for clustering longitudinal gene expression data
    Sun, Jiehuan
    Herazo-Maya, Jose D.
    Kaminski, Naftali
    Zhao, Hongyu
    Warren, Joshua L.
    STATISTICS IN MEDICINE, 2017, 36 (22) : 3495 - 3506
  • [4] A Spatial Dirichlet Process Mixture Model for Clustering Population Genetics Data
    Reich, Brian J.
    Bondell, Howard D.
    BIOMETRICS, 2011, 67 (02) : 381 - 390
  • [5] Anomaly detection model based on data stream clustering
    Yin, Chunyong
    Zhang, Sun
    Yin, Zhichao
    Wang, Jin
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 1729 - 1738
  • [6] Anomaly detection model based on data stream clustering
    Chunyong Yin
    Sun Zhang
    Zhichao Yin
    Jin Wang
    Cluster Computing, 2019, 22 : 1729 - 1738
  • [7] Research and application of an improved support vector clustering algorithm on anomaly detection
    Sun S.
    Wang Y.
    Journal of Software, 2010, 5 (03) : 328 - 335
  • [8] An improved clustering algorithm based on finite Gaussian mixture model
    He, Zhilin
    Ho, Chun-Hsing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (17) : 24285 - 24299
  • [9] Research on dirichlet process mixture model for clustering
    Zhang B.
    Zhang K.
    Zhong L.
    Zhang X.
    Ingenierie des Systemes d'Information, 2019, 24 (02): : 183 - 189
  • [10] An improved clustering algorithm based on finite Gaussian mixture model
    Zhilin He
    Chun-Hsing Ho
    Multimedia Tools and Applications, 2019, 78 : 24285 - 24299