Distributed Neural Networks for Missing Big Data Imputation

被引:0
|
作者
Petrozziello, Alessio [1 ,2 ]
Jordanov, Ivan [1 ]
Sommeregger, Christian [2 ]
机构
[1] Univ Portsmouth, Portsmouth, Hants, England
[2] Expedia Inc, London, England
关键词
Distributed Computation; Neural Networks; Missing Data Imputation; Big Data; ABSOLUTE ERROR MAE; RMSE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of Distributed Neural Networks for the imputation of missing values in Big Data context. The presented framework for data imputation is implemented in Spark, allowing easy imputation as an additional step to the data pre-processing pipeline. The Distributed Neural Networks model is using Mini-batch Stochastic Gradient Descent, scaling well with the cluster size and minimizing the communication among the workers. The model is tested on a real-world Recommender Systems dataset, where the missing data is generally a problem for new items, as the systems ranking is usually biased towards the popular items. The model is compared with univariate (Mean and Median Imputation) and multivariate (K-Nearest Neighbours and Linear Regression) imputation techniques, and its performance is validated using prediction accuracy and speed. Furthermore, we evaluate the speedup compared to the sequential implementation of Neural Networks with Stochastic Gradient Descent.
引用
收藏
页码:131 / 138
页数:8
相关论文
共 50 条
  • [41] LOW-DIMENSIONAL MODELS FOR MISSING DATA IMPUTATION IN ROAD NETWORKS
    Asif, Muhammad Tayyab
    Mitrovic, Nikola
    Garg, Lalit
    Dauwels, Justin
    Jaillet, Patrick
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3527 - 3531
  • [42] Evaluation of Missing Data Imputation Methods for an Enhanced Distributed PV Generation Prediction
    Sundararajan, Aditya
    Sarwat, Arif I.
    [J]. PROCEEDINGS OF THE FUTURE TECHNOLOGIES CONFERENCE (FTC) 2019, VOL 1, 2020, 1069 : 590 - 609
  • [43] Communication efficient distributed learning of neural networks in Big Data environments using Spark
    Alkhoury, Fouad
    Wegener, Dennis
    Sylla, Karl-Heinz
    Mock, Michael
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3871 - 3877
  • [44] Management of Distributed Big Data for Social Networks
    Leung, Carson K.
    Zhang, Hao
    [J]. 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 639 - 648
  • [45] Recurrent neural networks for missing or asynchronous data
    Bengio, Y
    Gingras, F
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 395 - 401
  • [46] Analysis of missing data with artificial neural networks
    Pastor, JBN
    Vidal, JML
    [J]. PSICOTHEMA, 2000, 12 (03) : 503 - 510
  • [47] MISSING DATA IMPUTATION FOR HEALTH CARE BIG DATA USING DENOISING AUTOENCODER WITH GENERATIVE ADVERSARIAL NETWORK
    Zhang, Yinbing
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (05): : 3850 - 3857
  • [48] MisConv: Convolutional Neural Networks for Missing Data
    Likowski, Marcin Przewiez
    Smieja, Marek
    Struski, Lukasz
    Tabor, Jacek
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2917 - 2926
  • [49] From Missing Data Imputation to Data Generation
    Neves, Diogo Telmo
    Alves, Joao
    Naik, Marcel Ganesh
    Proenca, Alberto Jose
    Prasser, Fabian
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2022, 61
  • [50] Neural Models for Imputation of Missing Ozone Data in Air-Quality Datasets
    Arroyo, Angel
    Herrero, Alvaro
    Tricio, Veronica
    Corchado, Emilio
    Wozniak, Michal
    [J]. COMPLEXITY, 2018,