Distributed Neural Networks for Missing Big Data Imputation

被引:0
|
作者
Petrozziello, Alessio [1 ,2 ]
Jordanov, Ivan [1 ]
Sommeregger, Christian [2 ]
机构
[1] Univ Portsmouth, Portsmouth, Hants, England
[2] Expedia Inc, London, England
关键词
Distributed Computation; Neural Networks; Missing Data Imputation; Big Data; ABSOLUTE ERROR MAE; RMSE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of Distributed Neural Networks for the imputation of missing values in Big Data context. The presented framework for data imputation is implemented in Spark, allowing easy imputation as an additional step to the data pre-processing pipeline. The Distributed Neural Networks model is using Mini-batch Stochastic Gradient Descent, scaling well with the cluster size and minimizing the communication among the workers. The model is tested on a real-world Recommender Systems dataset, where the missing data is generally a problem for new items, as the systems ranking is usually biased towards the popular items. The model is compared with univariate (Mean and Median Imputation) and multivariate (K-Nearest Neighbours and Linear Regression) imputation techniques, and its performance is validated using prediction accuracy and speed. Furthermore, we evaluate the speedup compared to the sequential implementation of Neural Networks with Stochastic Gradient Descent.
引用
收藏
页码:131 / 138
页数:8
相关论文
共 50 条
  • [21] Multivariate imputation of qualitative missing data using Bayesian networks
    Romero, V
    Salmerón, A
    [J]. SOFT METHODOLOGY AND RANDOM INFORMATION SYSTEMS, 2004, : 605 - 612
  • [22] Long-term missing value imputation for time series data using deep neural networks
    Jangho Park
    Juliane Müller
    Bhavna Arora
    Boris Faybishenko
    Gilberto Pastorello
    Charuleka Varadharajan
    Reetik Sahu
    Deborah Agarwal
    [J]. Neural Computing and Applications, 2023, 35 : 9071 - 9091
  • [23] Distributed personalized imputation based on Gaussian mixture model for missing data
    Sicong Chen
    Ying Liu
    [J]. Neural Computing and Applications, 2024, 36 (23) : 14237 - 14250
  • [24] Missing data imputation: focusing on single imputation
    Zhang, Zhongheng
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (01)
  • [25] Processing of missing data by neural networks
    Smieja, Marek
    Struski, Lukasz
    Tabor, Jacek
    Zielinski, Bartosz
    Spurek, Przemyslaw
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [26] Missing Data: data replacement and imputation
    Hutcheson, Graeme
    Pampaka, Maria
    [J]. JOURNAL OF MODELLING IN MANAGEMENT, 2012, 7 (02)
  • [27] Missing Data and Multiple Imputation
    Cummings, Peter
    [J]. JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [28] Missing Data Imputation: A Survey
    Kelkar, Bhagyashri Abhay
    [J]. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)
  • [29] Missing Data and Imputation Methods
    Schober, Patrick
    Vetter, Thomas R.
    [J]. ANESTHESIA AND ANALGESIA, 2020, 131 (05): : 1419 - 1420
  • [30] MISSING DATA, IMPUTATION, AND THE BOOTSTRAP
    EFRON, B
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (426) : 463 - 475