Distributed Neural Networks for Missing Big Data Imputation

被引:0
|
作者
Petrozziello, Alessio [1 ,2 ]
Jordanov, Ivan [1 ]
Sommeregger, Christian [2 ]
机构
[1] Univ Portsmouth, Portsmouth, Hants, England
[2] Expedia Inc, London, England
关键词
Distributed Computation; Neural Networks; Missing Data Imputation; Big Data; ABSOLUTE ERROR MAE; RMSE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of Distributed Neural Networks for the imputation of missing values in Big Data context. The presented framework for data imputation is implemented in Spark, allowing easy imputation as an additional step to the data pre-processing pipeline. The Distributed Neural Networks model is using Mini-batch Stochastic Gradient Descent, scaling well with the cluster size and minimizing the communication among the workers. The model is tested on a real-world Recommender Systems dataset, where the missing data is generally a problem for new items, as the systems ranking is usually biased towards the popular items. The model is compared with univariate (Mean and Median Imputation) and multivariate (K-Nearest Neighbours and Linear Regression) imputation techniques, and its performance is validated using prediction accuracy and speed. Furthermore, we evaluate the speedup compared to the sequential implementation of Neural Networks with Stochastic Gradient Descent.
引用
收藏
页码:131 / 138
页数:8
相关论文
共 50 条
  • [31] Missing data, imputation, and endogeneity
    McDonough, Ian K.
    Millimet, Daniel L.
    [J]. JOURNAL OF ECONOMETRICS, 2017, 199 (02) : 141 - 155
  • [32] Imputation of Missing Healthcare Data
    Chowdhury, Mohaimanul Hoque
    Islam, Muhammad Kamrul
    Khan, Shahidul Islam
    [J]. 2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [33] Multiple imputation for analysis of incomplete data in distributed health data networks
    Changgee Chang
    Yi Deng
    Xiaoqian Jiang
    Qi Long
    [J]. Nature Communications, 11
  • [34] BAYESIAN IMPUTATION FOR MISSING DATA
    Nads, Azman A.
    Polestico, Daisy Lou L.
    [J]. ADVANCES AND APPLICATIONS IN STATISTICS, 2022, 79 : 83 - 104
  • [35] Multiple imputation for analysis of incomplete data in distributed health data networks
    Chang, Changgee
    Deng, Yi
    Jiang, Xiaoqian
    Long, Qi
    [J]. NATURE COMMUNICATIONS, 2020, 11 (01)
  • [36] Multiple imputation for missing data
    Patrician, PA
    [J]. RESEARCH IN NURSING & HEALTH, 2002, 25 (01) : 76 - 84
  • [37] Imputation of missing data in surveys
    Rässler, S
    [J]. JAHRBUCHER FUR NATIONALOKONOMIE UND STATISTIK, 2000, 220 (01): : 64 - 94
  • [38] Multiple imputation of missing data
    Lydersen, Stian
    [J]. TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2022, 142 (02) : 151 - 151
  • [39] Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks
    Kim, Taesung
    Kim, Jinhee
    Yang, Wonho
    Lee, Hunjoo
    Choo, Jaegul
    [J]. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (22)
  • [40] Missing data imputation with adversarially-trained graph convolutional networks
    Spinelli, Indro
    Scardapane, Simone
    Uncini, Aurelio
    [J]. NEURAL NETWORKS, 2020, 129 : 249 - 260