Evolving large-scale data stream analytics based on scalable PANFIS

被引：6

作者：

Za'in, Choiru ^{[1
,3
]}

Pratama, Mahardhika ^{[2
,4
]}

Pardede, Eric ^{[1
,3
]}

机构：

[1] Plenty Rd & Kingsbury Dr, Bundoora, Vic 3086, Australia

[2] Nanyang Technol Univ, Sch Comp Sci & Engn, 50 Nanyang Ave, Singapore 639798, Singapore

[3] La Trobe Univ, Bundoora, Vic, Australia

[4] Nanyang Technol Univ, Singapore, Singapore

来源：

KNOWLEDGE-BASED SYSTEMS | 2019年 / 166卷

关键词：

Large-scale data stream analytics; Distributed data stream mining; Parallel data stream processing; Scalable machine learning; Big data; Knowledge integration (fusion); BIG DATA; MAPREDUCE; SYSTEMS; FUSION;

D O I：

10.1016/j.knosys.2018.12.028

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The main challenge in large-scale data stream analytics lies in the ability of machine learning to generate large-scale data knowledge in reasonable timeframe without suffering from a loss of accuracy. Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying a multi-class label dataset. (C) 2019 Elsevier B.V. All rights reserved.

引用

页码：186 / 197

页数：12

共 50 条

[1] Scalable computing for large-scale multimedia data analytics
Karuppiah, Marimuthu
Chaudhry, Shehzad Ashraf
Alsharif, Mohammed H.
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (03) : 601 - 603
[2] Big Data Analytics based on PANFIS MapReduce
Za'in, Choiru
Pratama, Mahardhika
Lughofer, Edwin
Ferdaus, Meftahul
Cai, Qing
Prasad, Mukesh
[J]. INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 140 - 152
[3] A Generic and Scalable Pipeline for Large-Scale Analytics of Continuous Aircraft Engine Data
Forest, Florent
Lacaille, Jerome
Lebbah, Mustapha
Azzag, Hanene
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1918 - 1924
[4] Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies
Kuplicki, Rayus
Touthang, James
Al Zoubi, Obada
Mayeli, Ahmad
Misaki, Masaya
Aupperle, Robin L.
Teague, T. Kent
McKinney, Brett A.
Paulus, Martin P.
Bodurka, Jerzy
[J]. FRONTIERS IN PSYCHIATRY, 2021, 12
[5] A Hybrid Data Model for Large-Scale Analytics
Feo, John
[J]. 2018 ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2018, : 269 - 269
[6] Scalable Data Analytics from Predevelopment to Large Scale Manufacturing
Heimes, Heiner
Kampker, Achim
Buhrer, Ulrich
Steinberger, Anita
Eirich, Joscha
Krotil, Stefan
[J]. 2019 ASIA PACIFIC CONFERENCE ON RESEARCH IN INDUSTRIAL AND SYSTEMS ENGINEERING (APCORISE), 2019, : 12 - 17
[7] Big Data Analytic based on Scalable PANFIS for RFID Localization
Za'in, Choiru
Pratama, Mahardhika
Ashfahani, Andri
Pardede, Eric
Sheng, Huang
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1687 - 1692
[8] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
Ghattas, Omar
Isaac, Tobin
Petra, Noemi
Stadler, Georg
[J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6
[9] Scalable XPath Evaluation On Large-Scale Continuously Evolving XML Repositories
Mullangi, Phani Rohit
Penematsa, Gowtham
Ramaswamy, Lakshmish
[J]. 2014 IEEE INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS), 2014, : 545 - 552
[10] Disco: A Computing Platform for Large-Scale Data Analytics
Mundkur, Prashanth
Tuulos, Ville
Flatow, Jared
[J]. ERLANG 11: PROCEEDINGS OF THE 2011 ACM SIGPLAN ERLANG WORKSHOP, 2011, : 84 - 89

← 1 2 3 4 5 →