Feature selection and classification model construction on type 2 diabetic patients' data

被引:96
|
作者
Huang, Yue [1 ]
McCullagh, Paul
Black, Norman
Harper, Roy
机构
[1] Univ London Imperial Coll Sci Technol & Med, Fac Engn, Dept Comp, London SW7 2AZ, England
[2] Univ Ulster, Fac Engn, Sch Comp & Math, Jordanstown BT37 0QB, North Ireland
[3] Ulster Hosp, Belfast BT16 0RH, Antrim, North Ireland
关键词
type; 2; diabetes; blood glucose; data mining; classification; feature selection;
D O I
10.1016/j.artmed.2007.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: Diabetes affects between 2% and 4% of the global population (up to 10% in the over 65 age group), and its avoidance and effective treatment are undoubtedly crucial public health and health economics issues in the 21st century. The aim of this research was to identify significant factors influencing diabetes control, by applying feature selection to a working patient management system to assist with ranking, classification and knowledge discovery. The classification models can be used to determine individuals in the population with poor diabetes control status based on physiological and examination factors. Methods: The diabetic patients' information was collected by Ulster Community and Hospitals Trust (UCHT) from year 2000 to 2004 as part of clinical management. In order to discover key predictors and latent knowledge, data mining techniques were applied. To improve computational efficiency, a feature selection technique, feature selection via supervised model construction (FSSMC), an optimisation of ReliefF, was used to rank the important attributes affecting diabetic control. After selecting suitable features, three complementary classification techniques (Naive Bayes, IB1 and C4.5) were applied to the data to predict how well the patients' condition was controlled. Results: FSSMC identified patients' 'age', 'diagnosis duration', the need for 'insulin treatment', 'random blood glucose' measurement and 'diet treatment' as the most important factors influencing blood glucose control. Using the reduced features, a best predictive accuracy of 95% and sensitivity of 98% was achieved. The influence of factors, such as 'type of care' delivered, the use of 'home monitoring', and the importance of 'smoking' on outcome can contribute to domain knowledge in diabetes control. Conclusion: In the care of patients with diabetes, the more important factors identified: patients' 'age', 'diagnosis duration' and 'family history', are beyond the control of physicians. Treatment methods such as 'insulin', 'diet' and 'tablets' (a variety of oral medicines) may be controlled. However lifestyle indicators such as 'body mass index' and 'smoking status' are also important and may be controlled by the patient. This further underlines the need for public health education to aid awareness and prevention. More subtle data interactions need to be better understood and data mining can contribute to the clinical evidence base. The research confirms and to a lesser extent challenges current thinking. Whilst fully appreciating the requirement for clinical verification and interpretation, this work supports the use of data mining as an exploratory tool., particularly as the domain is suffering from a data explosion due to enhanced monitoring and the (potential) storage of this data in the electronic health record, FSSMC has proved a useful feature estimator for large data sets, where processing efficiency is an important factor. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:251 / 262
页数:12
相关论文
共 50 条
  • [11] Local Feature Selection for Data Classification
    Armanfard, Narges
    Reilly, James P.
    Komeili, Majid
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (06) : 1217 - 1227
  • [12] An Effective Feature Selection and Data-Stream Classification Model HDP
    Yin, Chunyong
    Feng, Lu
    Ma, Luyu
    Kim, Jeong-Uk
    Wang, Jin
    JOURNAL OF INTERNET TECHNOLOGY, 2016, 17 (04): : 695 - 702
  • [13] A Hybrid Feature Selection Optimization Model for High Dimension Data Classification
    Qaraad, Mohammed
    Amjad, Souad
    Manhrawy, Ibrahim I. M.
    Fathi, Hanaa
    Hassan, Bayoumi Ali
    El Kafrawy, Passent
    IEEE ACCESS, 2021, 9 : 42884 - 42895
  • [14] Model and feature selection in microarray classification
    Peterson, DA
    Thaut, MH
    PROCEEDINGS OF THE 2004 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2004, : 56 - 60
  • [15] Online feature selection and classification with incomplete data
    Kalkan, Habil
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (06) : 1625 - 1636
  • [16] Feature Selection for Classification of Hyperspectral Data by SVM
    Pal, Mahesh
    Foody, Giles M.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2010, 48 (05): : 2297 - 2307
  • [17] Feature Selection in Clinical Data Processing For Classification
    Seethal, C. R.
    Panicker, Janu R.
    Vasudevan, Veena
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE (ICIS), 2016, : 172 - 175
  • [18] Automatic feature selection for classification of health data
    He, HX
    Jin, HD
    Chen, J
    AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 910 - 913
  • [19] Feature Selection for EEG Data Classification with Weka
    Murtazina, Marina
    Avdeenko, Tatiana
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2022, PT II, 2022, : 279 - 288
  • [20] A Projected Feature Selection Algorithm for Data Classification
    Yin, Zhiwu
    Huang, Shangteng
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3665 - 3668