Distributed Neural Networks for Missing Big Data Imputation

被引:0
|
作者
Petrozziello, Alessio [1 ,2 ]
Jordanov, Ivan [1 ]
Sommeregger, Christian [2 ]
机构
[1] Univ Portsmouth, Portsmouth, Hants, England
[2] Expedia Inc, London, England
关键词
Distributed Computation; Neural Networks; Missing Data Imputation; Big Data; ABSOLUTE ERROR MAE; RMSE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate the use of Distributed Neural Networks for the imputation of missing values in Big Data context. The presented framework for data imputation is implemented in Spark, allowing easy imputation as an additional step to the data pre-processing pipeline. The Distributed Neural Networks model is using Mini-batch Stochastic Gradient Descent, scaling well with the cluster size and minimizing the communication among the workers. The model is tested on a real-world Recommender Systems dataset, where the missing data is generally a problem for new items, as the systems ranking is usually biased towards the popular items. The model is compared with univariate (Mean and Median Imputation) and multivariate (K-Nearest Neighbours and Linear Regression) imputation techniques, and its performance is validated using prediction accuracy and speed. Furthermore, we evaluate the speedup compared to the sequential implementation of Neural Networks with Stochastic Gradient Descent.
引用
收藏
页码:131 / 138
页数:8
相关论文
共 50 条
  • [1] Imputation of missing data with neural networks for classification
    Choudhury, Suyra Jyoti
    Pal, Nikhil R.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 182
  • [2] SCALABLE MISSING DATA IMPUTATION WITH GRAPH NEURAL NETWORKS
    Lachaud, Guillaume
    Conde-Cespedes, Patricia
    Trocan, Maria
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [3] Missing Pavement Performance Data Imputation Using Graph Neural Networks
    Gao, Lu
    Yu, Ke
    Lu, Pan
    [J]. TRANSPORTATION RESEARCH RECORD, 2022, 2676 (12) : 409 - 419
  • [4] Systematically missing data in distributed data networks: multiple imputation when data cannot be pooled
    Thiesmeier, Robert
    Bottai, Matteo
    Orsini, Nicola
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024,
  • [5] Optimization of missing value imputation for neural networks
    Han, Jongmin
    Kang, Seokho
    [J]. INFORMATION SCIENCES, 2023, 649
  • [6] Fuzzy min–max neural networks for categorical data: application to missing data imputation
    Pilar Rey-del-Castillo
    Jesús Cardeñosa
    [J]. Neural Computing and Applications, 2012, 21 : 1349 - 1362
  • [7] A New Approach for Missing Data Imputation in Big Data Interface
    Wang, Chunzhi
    Shakhovska, Nataliya
    Sachenko, Anatoliy
    Komar, Myroslav
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2020, 49 (04): : 541 - 555
  • [8] Recurrent Neural Networks With Missing Information Imputation For Medical Examination Data Prediction
    Kim, Han-Gyu
    Jang, Gil-Jin
    Choi, Ho-Jin
    Kim, Minho
    Kim, Young-Won
    Choi, Jaehun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 317 - 323
  • [9] Geographical Imputation of Missing Poaceae Pollen Data via Convolutional Neural Networks
    Navares, Ricardo
    Luis Aznarte, Jose
    [J]. ATMOSPHERE, 2019, 10 (11)
  • [10] A First Approach on Big Data Missing Values Imputation
    Montesdeoca, Besay
    Luengo, Julian
    Maillo, Jesus
    Garcia-Gil, Diego
    Garcia, Salvador
    Herrera, Francisco
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS 2019), 2019, : 315 - 323