High-performance IoT streaming data prediction system using Spark: a case study of air pollution

被引:0
|
作者
Ho-Yong Jin
Eun-Sung Jung
Duckki Lee
机构
[1] Hongik University,Department of Software and Communications Engineering
[2] Yonam Institute of Technology,Department of Smart Software
来源
关键词
Long Short-Term Memory (LSTM); Distributed deep learning; Distributed Keras (Dist-Keras); Apache Spark;
D O I
暂无
中图分类号
学科分类号
摘要
Internet-of-Things (IoT) devices are becoming prevalent, and some of them, such as sensors, generate continuous time-series data, i.e., streaming data. These IoT streaming data are one of Big Data sources, and they require careful consideration for efficient data processing and analysis. Deep learning is emerging as a solution to IoT streaming data analytics. However, there is a persistent problem in deep learning that it takes a long time to learn neural networks. In this paper, we propose a high-performance IoT streaming data prediction system to improve the learning speed and to predict in real time. We showed the efficacy of the system through a case study of air pollution. The experimental results show that the modified LSTM autoencoder model shows the best performance compared to a generic LSTM model. We noticed that achieving the best performance requires optimizing many parameters, including learning rate, epoch, memory cell size, input timestep size, and the number of features/predictors. In that regard, we show that the high-performance data learning/prediction frameworks (e.g., Spark, Dist-Keras, and Hadoop) are essential to rapidly fine-tune a model for training and testing before real deployment of the model as data accumulate.
引用
收藏
页码:13147 / 13154
页数:7
相关论文
共 50 条
  • [31] High Performance Air Pollution Simulation Using OpenMP
    María J. Martín
    Marta Parada
    Ramón Doallo
    The Journal of Supercomputing, 2004, 28 : 311 - 321
  • [32] A High-Performance Hybrid Blockchain System for Traceable IoT Applications
    Wang, Xu
    Yu, Ping
    Yu, Guangsheng
    Zha, Xuan
    Ni, Wei
    Liu, Ren Ping
    Guo, Y. Jay
    NETWORK AND SYSTEM SECURITY, NSS 2019, 2019, 11928 : 721 - 728
  • [33] High-Performance Data Management for Genome Sequencing Centers Using Globus Online: A Case Study
    Sulakhe, Dinanath
    Kettimuthu, Rajkumar
    Dave, Utpal
    2012 IEEE 8TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2012,
  • [34] Genetic Fuzzy System for the prediction of air pollution level by Particulate Matter - Case study: Bogota
    Riveros Varela, Carlos Alberto
    Melgarejo Rey, Miguel Alberto
    Varela, Andrea Riveros
    Alvarado Nieto, Luz Deicy
    INGENIERIA, 2012, 17 (02): : 55 - 62
  • [36] The case for air-entrainment in high-performance concrete
    Bassuoni, MT
    Nehdi, ML
    PROCEEDINGS OF THE INSTITUTION OF CIVIL ENGINEERS-STRUCTURES AND BUILDINGS, 2005, 158 (05) : 311 - 319
  • [37] A case study for performance analysis of big data stream classification using spark architecture
    B. Srivani
    N. Sandhya
    B. Padmaja Rani
    International Journal of System Assurance Engineering and Management, 2024, 15 : 253 - 266
  • [38] A case study for performance analysis of big data stream classification using spark architecture
    Srivani, B.
    Sandhya, N.
    Rani, B. Padmaja
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (01) : 253 - 266
  • [39] PERFORMANCE MEASUREMENT WITH HIGH-PERFORMANCE COMPUTER USING HW-GA ANOMALY-DETECTION ALGORITHMS FOR STREAMING DATA
    Fondaj, Jakup
    Hasani, Zirije
    Krrabaj, Samedin
    COMPUTER SCIENCE-AGH, 2022, 23 (03): : 397 - 410
  • [40] Air Pollution Detection and Prediction Using Multi Sensor Data Fusion
    Brumancia, E.
    Samuel, S. Justin
    Gladence, L. Mary
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 844 - 849