Imputing missing values in unevenly spaced clinical time series data to build an effective temporal classification framework

被引:15
|
作者
Nancy, Jane Y. [1 ]
Khanna, Nehemiah H. [1 ]
Arputharaj, Kannan [2 ]
机构
[1] Anna Univ, Ramanujan Comp Ctr, Madras 600025, Tamil Nadu, India
[2] Anna Univ, Dept Informat Sci & Technol, Madras 600025, Tamil Nadu, India
关键词
Time series; Missing value; Tolerance rough set; Particle swarm optimization; Inverse distance weight; HOT DECK; SPATIAL INTERPOLATION; MULTIPLE IMPUTATION;
D O I
10.1016/j.csda.2017.02.012
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
BACKGROUND: In healthcare domain, clinical trials generate time-stamped data that record set of observations on patient health status. These data are liable to missing values since there are situations, where the patient observations are neither done regularly nor updated correctly. OBJECTIVE: This paper aims to impute missing values in an unevenly spaced clinical time series data by proposing a tolerance rough set induced bio-statistical (TRiBS) framework. The proposed framework adopts an inverse distance weight (IDW) interpolation technique and improves it using the concept of tolerance rough set (TR) and particle swarm optimization (PSO). METHOD: To interpolate an unknown data point, the classical IDW interpolation suffers from two major drawbacks: first, in selecting the known data points and second, choosing an optimal influence factor. TRiBS framework overcomes the first limitation using TR and the second using PSO. TR derives the dependent attributes for each attribute using non- missing records. The nearest significant set is then generated for each missing value based on its attribute dependencies. The PSO technique fixes the weights for the data in a nearest significant set by finding an optimized influence factor. The obtained significant set and its influence factor are used in IDW computations to impute missing value. RESULT: The proposed work is experimented using clinical time series dataset of hepatitis and thrombosis patients. However, the proposed system can support other clinical time series dataset with minor domain specific changes. CONCLUSION: The performance of the imputed results proves the effectiveness of TRiBS. Experimental evaluation with the classifiers such as neural networks, support vector machine (SVM) and decision tree have shown an improvement in the classification accuracy when a missing data is pre-processed with the proposed framework. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:63 / 79
页数:17
相关论文
共 50 条
  • [31] A Temporal Mining Framework for Classifying Un-Evenly Spaced Clinical Data An Approach for Building Effective Clinical Decision-Making System
    Jane, Nancy Yesudhas
    Nehemiah, Khanna Harichandran
    Arputharaj, Kannan
    APPLIED CLINICAL INFORMATICS, 2016, 7 (01): : 1 - 21
  • [32] Test-Cost Sensitive Classification on Data with Missing Values in the Limited Time
    Wan, Chang
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I, 2010, 6276 : 501 - 510
  • [33] Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
    Mir, Adil Aslam
    Kearfott, Kimberlee Jane
    Celebi, Fatih Vehbi
    Rafique, Muhammad
    PLOS ONE, 2022, 17 (01):
  • [34] Joint Modeling of Local and Global Temporal Dynamics for Multivariate Time Series Forecasting with Missing Values
    Tang, Xianfeng
    Yao, Huaxiu
    Sun, Yiwei
    Aggarwal, Charu
    Mitra, Prasenjit
    Wang, Suhang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5956 - 5963
  • [35] Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series
    Shi, Weiwei
    Zhu, Yongxin
    Yu, Philip S.
    Zhang, Jiawei
    Huang, Tian
    Wang, Chang
    Chen, Yufeng
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 473 - 486
  • [36] Prediction of GWL with the help of GRACE TWS for unevenly spaced time series data in India : Analysis of comparative performances of SVR, ANN and LRM
    Mukherjee, Amritendu
    Ramachandran, Parthasarathy
    JOURNAL OF HYDROLOGY, 2018, 558 : 647 - 658
  • [37] DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced Data With Missing Values
    Wang, Qian
    Cao, Weijia
    Guo, Jiawei
    Ren, Jiadong
    Cheng, Yongqiang
    Davis, Darryl N.
    IEEE ACCESS, 2019, 7 : 102232 - 102238
  • [38] An unsupervised neural network approach for imputation of missing values in univariate time series data
    Savarimuthu, Nickolas
    Karesiddaiah, Shobha
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (09):
  • [39] Augmenting energy time-series for data-efficient imputation of missing values
    Liguori, Antonio
    Markovic, Romana
    Ferrando, Martina
    Frisch, Jerome
    Causone, Francesco
    van Treeck, Christoph
    APPLIED ENERGY, 2023, 334
  • [40] Massively-Parallel Change Detection for Satellite Time Series Data with Missing Values
    Gieseke, Fabian
    Rosca, Sabina
    Henriksen, Troels
    Verbesselt, Jan
    Oancea, Cosmin E.
    2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 385 - 396