Stable feature selection for clinical prediction: Exploiting ICD tree structure using Tree-Lasso

被引:56
|
作者
Kamkar, Iman [1 ]
Gupta, Sunil Kumar [1 ]
Dinh Phung [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic 3217, Australia
关键词
Feature selection; Lasso; Tree-Lasso; Feature stability; Classification; ACUTE MYOCARDIAL-INFARCTION; MODEL SELECTION; EHR DATA; REGRESSION; CANCER; STABILITY; READMISSION; ALGORITHMS; STRATEGIES; MORTALITY;
D O I
10.1016/j.jbi.2014.11.013
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l(1)-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis. (C) 2014 Elsevier Inc. All rights reserved.
引用
收藏
页码:277 / 290
页数:14
相关论文
共 50 条
  • [1] Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression
    Jovanovic, Milos
    Radovanovic, Sandro
    Vukicevic, Milan
    Van Poucke, Sven
    Delibasic, Boris
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 72 : 12 - 21
  • [2] Efficient Feature Selection for Prediction of Diabetic Using LASSO
    Kumarage, Prabha M.
    Yogarajah, B.
    Ratnarajah, Nagulan
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [3] Feature selection for classification using decision tree
    Tahir, Nooritawati Md
    Hussain, Aini
    Samad, Salina Abdul
    Ishak, Khairul Anuar
    Halim, Rosmawati Abdul
    [J]. 2006 4TH STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT, 2006, : 99 - +
  • [4] Feature selection and classification using flexible neural tree
    Chen, Yuehui
    Abraham, Ajith
    Yang, Bo
    [J]. NEUROCOMPUTING, 2006, 70 (1-3) : 305 - 313
  • [5] Tree-base Structure for Feature Selection in Writer Identification
    Sukor, Nooraziera Akmal
    Muda, Azah Kamilah
    Muda, Noor Azilah
    Choo, Yun-Huoy
    Goh, Ong Sing
    [J]. PATTERN ANALYSIS, INTELLIGENT SECURITY AND THE INTERNET OF THINGS, 2015, 355 : 201 - 213
  • [6] A novel condensing tree structure for rough set feature selection
    Yang, Ming
    Yang, Ping
    [J]. NEUROCOMPUTING, 2008, 71 (4-6) : 1092 - 1100
  • [7] Landslide Susceptibility Prediction based on Decision Tree and Feature Selection Methods
    Anand Nirbhav
    Tony Malik
    Mukesh Maheshwar
    [J]. Journal of the Indian Society of Remote Sensing, 2023, 51 : 771 - 786
  • [8] Landslide Susceptibility Prediction based on Decision Tree and Feature Selection Methods
    Nirbhav
    Malik, Anand
    Maheshwar
    Jan, Tony
    Prasad, Mukesh
    [J]. JOURNAL OF THE INDIAN SOCIETY OF REMOTE SENSING, 2023, 51 (04) : 771 - 786
  • [9] Financial distress prediction using a corrected feature selection measure and gradient boosted decision tree
    Qian, Hongyi
    Wang, Baohui
    Yuan, Minghe
    Gao, Songfeng
    Song, You
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190
  • [10] Exploiting Categorical Structure Using Tree-Based Methods
    Lucena, Brian
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2949 - 2957