Evolving large-scale data stream analytics based on scalable PANFIS

被引:6
|
作者
Za'in, Choiru [1 ,3 ]
Pratama, Mahardhika [2 ,4 ]
Pardede, Eric [1 ,3 ]
机构
[1] Plenty Rd & Kingsbury Dr, Bundoora, Vic 3086, Australia
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, 50 Nanyang Ave, Singapore 639798, Singapore
[3] La Trobe Univ, Bundoora, Vic, Australia
[4] Nanyang Technol Univ, Singapore, Singapore
关键词
Large-scale data stream analytics; Distributed data stream mining; Parallel data stream processing; Scalable machine learning; Big data; Knowledge integration (fusion); BIG DATA; MAPREDUCE; SYSTEMS; FUSION;
D O I
10.1016/j.knosys.2018.12.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main challenge in large-scale data stream analytics lies in the ability of machine learning to generate large-scale data knowledge in reasonable timeframe without suffering from a loss of accuracy. Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying a multi-class label dataset. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:186 / 197
页数:12
相关论文
共 50 条
  • [1] Scalable computing for large-scale multimedia data analytics
    Karuppiah, Marimuthu
    Chaudhry, Shehzad Ashraf
    Alsharif, Mohammed H.
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (03) : 601 - 603
  • [2] Big Data Analytics based on PANFIS MapReduce
    Za'in, Choiru
    Pratama, Mahardhika
    Lughofer, Edwin
    Ferdaus, Meftahul
    Cai, Qing
    Prasad, Mukesh
    [J]. INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
  • [3] A Generic and Scalable Pipeline for Large-Scale Analytics of Continuous Aircraft Engine Data
    Forest, Florent
    Lacaille, Jerome
    Lebbah, Mustapha
    Azzag, Hanene
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1918 - 1924
  • [4] Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies
    Kuplicki, Rayus
    Touthang, James
    Al Zoubi, Obada
    Mayeli, Ahmad
    Misaki, Masaya
    Aupperle, Robin L.
    Teague, T. Kent
    McKinney, Brett A.
    Paulus, Martin P.
    Bodurka, Jerzy
    [J]. FRONTIERS IN PSYCHIATRY, 2021, 12
  • [5] A Hybrid Data Model for Large-Scale Analytics
    Feo, John
    [J]. 2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 269 - 269
  • [6] Scalable Data Analytics from Predevelopment to Large Scale Manufacturing
    Heimes, Heiner
    Kampker, Achim
    Buhrer, Ulrich
    Steinberger, Anita
    Eirich, Joscha
    Krotil, Stefan
    [J]. 2019 ASIA PACIFIC CONFERENCE ON RESEARCH IN INDUSTRIAL AND SYSTEMS ENGINEERING (APCORISE), 2019, : 12 - 17
  • [7] Big Data Analytic based on Scalable PANFIS for RFID Localization
    Za'in, Choiru
    Pratama, Mahardhika
    Ashfahani, Andri
    Pardede, Eric
    Sheng, Huang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1687 - 1692
  • [8] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
    Ghattas, Omar
    Isaac, Tobin
    Petra, Noemi
    Stadler, Georg
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6
  • [9] Scalable XPath Evaluation On Large-Scale Continuously Evolving XML Repositories
    Mullangi, Phani Rohit
    Penematsa, Gowtham
    Ramaswamy, Lakshmish
    [J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 545 - 552
  • [10] Disco: A Computing Platform for Large-Scale Data Analytics
    Mundkur, Prashanth
    Tuulos, Ville
    Flatow, Jared
    [J]. ERLANG 11: PROCEEDINGS OF THE 2011 ACM SIGPLAN ERLANG WORKSHOP, 2011, : 84 - 89