Real-Data Comparison of Data Mining Methods in Prediction of Diabetes in Iran

被引:3
|
作者
Tapak, Lily [1 ]
Mahjub, Hossein [2 ,3 ]
Hamidi, Omid [4 ]
Poorolajal, Jalal [2 ,3 ]
机构
[1] Hamadan Univ Med Sci, Sch Publ Hlth, Dept Biostat, Hamadan, Iran
[2] Hamadan Univ Med Sci, Sch Publ Hlth, Res Ctr Hlth Sci, Hamadan, Iran
[3] Hamadan Univ Med Sci, Sch Publ Hlth, Dept Epidemiol & Biostat, Hamadan, Iran
[4] Hamadan Univ Technol, Dept Sci, Hamadan, Iran
关键词
Diabetes; Cluster Sampling; Data Mining; Support Vector Machine; Logistic Regression;
D O I
10.4258/hir.2013.19.3.177
中图分类号
R-058 [];
学科分类号
摘要
Objectives: Diabetes is one of the most common non-communicable diseases in developing countries. Early screening and diagnosis play an important role in effective prevention strategies. This study compared two traditional classification methods (logistic regression and Fisher linear discriminant analysis) and four machine-learning classifiers (neural networks, support vector machines, fuzzy c-mean, and random forests) to classify persons with and without diabetes. Methods: The data set used in this study included 6,500 subjects from the Iranian national non-communicable diseases risk factors surveillance obtained through a cross-sectional survey. The obtained sample was based on cluster sampling of the Iran population which was conducted in 2005-2009 to assess the prevalence of major non-communicable disease risk factors. Ten risk factors that are commonly associated with diabetes were selected to compare the performance of six classifiers in terms of sensitivity, specificity, total accuracy, and area under the receiver operating characteristic (ROC) curve criteria. Results: Support vector machines showed the highest total accuracy (0.986) as well as area under the ROC (0.979). Also, this method showed high specificity (1.000) and sensitivity (0.820). All other methods produced total accuracy of more than 85%, but for all methods, the sensitivity values were very low (less than 0.350). Conclusions: The results of this study indicate that, in terms of sensitivity, specificity, and overall classification accuracy, the support vector machine model ranks first among all the classifiers tested in the prediction of diabetes. Therefore, this approach is a promising classifier for predicting diabetes, and it should be further investigated for the prediction of other diseases.
引用
收藏
页码:177 / 185
页数:9
相关论文
共 50 条
  • [1] Development of a model for trauma outcome prediction: a real-data comparison of Artificial Neural Networks, logistic regression and data mining techniques
    Koukouvinos, C.
    Parpoula, C.
    [J]. INTERNATIONAL JOURNAL OF BIOMEDICAL ENGINEERING AND TECHNOLOGY, 2012, 10 (01) : 84 - 99
  • [2] Application of Data Mining Methods in Diabetes Prediction
    Komi, Messan
    Li, Jun
    Zhai, Yongxin
    Zhang, Xianguo
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC 2017), 2017, : 1006 - 1010
  • [3] Data Mining Models Comparison for Diabetes Prediction
    Azrar, Amina
    Awais, Muhammad
    Ali, Yasir
    Zaheer, Khurram
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (08) : 320 - 323
  • [4] Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests
    Maroco J.
    Silva D.
    Rodrigues A.
    Guerreiro M.
    Santana I.
    De Mendonça A.
    [J]. BMC Research Notes, 4 (1)
  • [5] Diabetes Detection by Data Mining Methods
    V. Ambikavathi
    P. Arumugam
    P. Jose
    [J]. Wireless Personal Communications, 2023, 133 : 2087 - 2104
  • [6] Diabetes Detection by Data Mining Methods
    Ambikavathi, V.
    Arumugam, P.
    Jose, P.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2023, 133 (04) : 2087 - 2104
  • [7] Stochastic learning methods for dynamic neural networks: simulated and real-data comparisons
    Patan, K
    Parisini, T
    [J]. PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 2577 - 2582
  • [8] A Comparison of Data Mining Methods in Analyzing Educational Data
    Jung, Euihyun
    [J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2017, 421 : 173 - 178
  • [9] A Comparison of Data Mining Methods in Microfinance
    Wu, Jia
    Vadera, Sunil
    Dayson, Karl
    Burridge, Diane
    Clough, Ian
    [J]. 2010 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND FINANCIAL ENGINEERING (ICIFE), 2010, : 499 - 502
  • [10] Diabetes Disease Prediction Using Data Mining
    Shetty, Deeraj
    Rit, Kishor
    Shaikh, Sohail
    Patil, Nikita
    [J]. 2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,