A dynamic ensemble approach to robust classification in the presence of missing data

被引:0
|
作者
Bryan Conroy
Larry Eshelman
Cristhian Potes
Minnan Xu-Wilson
机构
[1] Philips Research North America,
来源
Machine Learning | 2016年 / 102卷
关键词
Missing data; Ensemble methods; Hemodynamic instability;
D O I
暂无
中图分类号
学科分类号
摘要
Many real-world datasets suffer from missing or incomplete data. In the healthcare setting, for example, certain patient measurement parameters, such as vitals and/or lab values, may be missing due to insufficient monitoring. When present, however, these features could be highly discriminative in predicting aspects of patient state. Therefore, it is desirable to incorporate these sparsely measured features into a predictive model. Training predictive algorithms on such datasets is complicated by the missing data. Overcoming this problem is usually achieved by first estimating values for the missing data, which is referred to as data imputation. Without strong prior knowledge about the relationship between features though, it is common to fill in missing values with their respective population mean or median. The accuracy of this approach is limited, however, and may simply inject noise into the data. We propose a two-stage machine learning algorithm that learns a dynamic classifier ensemble from an incomplete dataset without data imputation. The algorithm is very simple to implement and applicable across a wide range of problems. Our method first employs a variant of AdaBoost to learn a set of low-dimensional classifiers, each of which abstains from predicting if its dependent feature(s) are missing. Our novel contribution is the secondary dynamic ensemble learning stage in which the low-dimensional classifiers are combined using a dynamic weighting that depends on the pattern of measured features in the present input data. This allows the model to be resilient to missing data by adjusting the strength of certain classifiers to account for missing features. We apply our algorithm to early detection of hemodynamic instability in ICU patients. Providing an effective risk score of hemodynamic instability has the potential to give the clinician sufficient time to intervene, thereby reducing the chance of organ damage due to insufficient blood perfusion. We compare the results of our algorithm to other common missing data approaches, including mean imputation and multiple imputation methods, and discuss the advantages of the approach given the constraints of the application domain (e.g., high specificity to combat hospital alarm fatigue).
引用
收藏
页码:443 / 463
页数:20
相关论文
共 50 条
  • [1] A dynamic ensemble approach to robust classification in the presence of missing data
    Conroy, Bryan
    Eshelman, Larry
    Potes, Cristhian
    Xu-Wilson, Minnan
    [J]. MACHINE LEARNING, 2016, 102 (03) : 443 - 463
  • [2] Clustering Data with the Presence of Missing Values by Ensemble Approach
    Pattanodom, Mullika
    Iam-On, Natthakan
    Boongoen, Tossapon
    [J]. 2016 SECOND ASIAN CONFERENCE ON DEFENCE TECHNOLOGY (ACDT), 2016, : 151 - 156
  • [3] Dynamic robust design with missing data
    Chang, Hsu-Hwa
    [J]. INTERNATIONAL JOURNAL OF QUALITY & RELIABILITY MANAGEMENT, 2007, 24 (06) : 602 - +
  • [4] Regression in the presence missing data using ensemble methods
    Hassan, Mostafa M.
    Atiya, Amir F.
    El-Gayar, Neamat
    El-Fouly, Raafat
    [J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 1261 - +
  • [5] Robust classification ensemble method for microarray data
    Chung, Dongjun
    Kim, Hyunjoong
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2011, 5 (05) : 504 - 518
  • [6] Ensemble Approach for the Classification of Imbalanced Data
    Nikulin, Vladimir
    McLachlan, Geoffrey J.
    Ng, Shu Kay
    [J]. AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5866 : 291 - +
  • [7] A subspace ensemble framework for classification with high dimensional missing data
    Hang Gao
    Songlei Jian
    Yuxing Peng
    Xinwang Liu
    [J]. Multidimensional Systems and Signal Processing, 2017, 28 : 1309 - 1324
  • [8] A subspace ensemble framework for classification with high dimensional missing data
    Gao, Hang
    Jian, Songlei
    Peng, Yuxing
    Liu, Xinwang
    [J]. MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (04) : 1309 - 1324
  • [9] An Ensemble Approach for the Diagnosis of Cognitive Decline with Missing Data
    Garcia Baez, Patricio
    Fernandez Viadero, Carlos
    Regidor Garcia, Jose
    Suarez Araujo, Carmen Paz
    [J]. HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2008, 5271 : 353 - +
  • [10] eXITs: An Ensemble Approach for Imputing Missing EHR Data
    Coddle, James
    Sarker, Hullo
    Chakraborty, Prithwish
    Ghalwash, Mohamed
    Yao, Zijun
    Sow, Daby
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 544 - 546