Data Imputation for Multivariate Time Series Sensor Data With Large Gaps of Missing Data

被引:11
|
作者
Wu, Rui [1 ]
Hamshaw, Scott D. [2 ]
Yang, Lei [3 ]
Kincaid, Dustin W. [4 ]
Etheridge, Randall [5 ,6 ]
Ghasemkhani, Amir [7 ]
机构
[1] East Carolina Univ, Dept Comp Sci, Greenville, NC 27858 USA
[2] Univ Vermont, Dept Civil & Environm Engn, Burlington, VT 05405 USA
[3] Univ Nevada, Dept Comp Sci & Engn, Reno, NV 89557 USA
[4] Univ Vermont, Vermont EPSCoR & Gund Inst Environm, Burlington, VT 05405 USA
[5] East Carolina Univ, Dept Engn, Greenville, NC 27858 USA
[6] East Carolina Univ, Ctr Sustainable Energy & Environm Engn, Greenville, NC 27858 USA
[7] Calif State Univ San Bernardino, Dept Comp Sci & Engn, San Bernardino, CA 92407 USA
基金
美国国家科学基金会;
关键词
Data imputation; large missing data gap; MICE; multivariate; time series; MULTIPLE IMPUTATION; SCALE; FLOW;
D O I
10.1109/JSEN.2022.3166643
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Imputation of missing sensor-collected data is often an important step prior to machine learning and statistical data analysis. One particular data imputation challenge is filling large data gaps when the only related data comes from the same sensor station. In this paper, we propose a framework to improve the popular multivariate imputation by chained equations (MICE) method for dealing with missing data. One key strategy we use to improve model accuracy is to reshape the original sensor data to leverage the correlation between the missing data and the observed data. We demonstrate our framework using data from continuous water quality monitoring stations in Vermont. Because of possible irregularly spaced peaks throughout the time series, the reshaped data is split into extreme and normal values and two MICE models are built. We also recommend that sensor-collected data should be transformed to meet the machine learning model assumptions. According to our experimental results, these strategies can improve MICE data imputation model accuracy at least 23% for large data gaps based on R-2 values and are promising to be applied for other data imputation algorithms.
引用
收藏
页码:10671 / 10683
页数:13
相关论文
共 50 条
  • [1] Missing Value Imputation for Industrial IoT Sensor Data With Large Gaps
    Liu, Yuehua
    Dillon, Tharam
    Yu, Wenjin
    Rahayu, Wenny
    Mostafa, Fahed
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (08): : 6855 - 6867
  • [2] Multivariate Time Series Missing Data Imputation Using Recurrent Denoising Autoencoder
    Zhang, Jianye
    Yin, Peng
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 760 - 764
  • [3] Time Series Data and Recent Imputation Techniques for Missing Data: A Review
    Zainuddin, Aznilinda
    Hairuddin, Muhammad Asraf
    Yassin, Ahmad Ihsan Mohd
    Abd Latiff, Zatul Iffah
    Azhar, Aziemah
    [J]. 2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 346 - 350
  • [4] Missing data imputation in multivariate data by evolutionary algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2011, 27 (05) : 1468 - 1474
  • [5] Missing Data Imputation in Time Series of Air Pollution
    Junger, Washington
    de Leon, Antonio Ponce
    [J]. EPIDEMIOLOGY, 2009, 20 (06) : S87 - S87
  • [6] Imputation of missing data in time series for air pollutants
    Junger, W. L.
    de Leon, A. Ponce
    [J]. ATMOSPHERIC ENVIRONMENT, 2015, 102 : 96 - 104
  • [7] Imputation of Missing Value Using Dynamic Bayesian Network for Multivariate Time Series Data
    Susanti, Steffi Pauli
    Azizah, Fazat Nur
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2017,
  • [8] Missing Data Imputation for Multivariate Time series in Industrial IoT: A Federated Learning Approach
    Gkillas, Alexandros
    Lalos, Aris S.
    [J]. 2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 87 - 94
  • [9] Missing Data Imputation in Time Series by Evolutionary Algorithms
    Figueroa Garcia, Juan C.
    Kalenatic, Dusko
    Lopez Bello, Cesar Amilcar
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2008, 5227 : 275 - +
  • [10] Long Gaps Missing IoT Sensors Time Series Data Imputation: A Bayesian Gaussian Approach
    Ahmed, Hassan M.
    Abdulrazak, Bessam
    Blanchet, F. Guillaume
    Aloulou, Hamdi
    Mokhtari, Mounir
    [J]. IEEE ACCESS, 2022, 10 : 116107 - 116119