A sharing data approach oriented to distributed online learning

被引:0
|
作者
Zhang Y. [1 ,2 ]
Liu W. [1 ]
Shao L.-S. [2 ]
机构
[1] College of Science, Liaoning Technical University, Fuxin
[2] Research Centre in Management Science, Liaoning Technical University, Huludao
来源
Kongzhi yu Juece/Control and Decision | 2021年 / 36卷 / 08期
关键词
Distributed data stream; Global learner; Online learning; Rebuilding data set; Semi-supervised clustering; Sharing data;
D O I
10.13195/j.kzyjc.2019.1811
中图分类号
学科分类号
摘要
Distributed data stream generated by current data-driven applications has become a main data representation. Although distributed data stream is captured from different data sources, they are correlated to a common event. Hence, the key issue of distributed online learning is how to build global learners by sharing data of local node. For this problem, this paper proposes a sharing data solution for distributed online learning, containing the semi-supervised clustering approach based on exponential loss and the sharing data approach based on covariance matrixes and mean vectors, and proves the cumulative absolute error between the rebuilding data set and the original data set is bounded on the given threshold under some probability. Experimental study demonstrates that the proposed approach has lower network traffic between nodes, and gets the learner having better generalization capability. Copyright ©2021 Control and Decision.
引用
收藏
页码:1871 / 1880
页数:9
相关论文
共 35 条
  • [11] Chen R, Sivakumar K, Kargupta H., Distributed web mining using bayesian networks from multiple data streams, Proceedings of the 2011 IEEE International Conference on Data Mining, pp. 75-82, (2002)
  • [12] Ramirez-Gallego S, Krawczyk B, Garcia S, Et al., Nearest neighbor classification for high-speed big data streams using spark, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 47, 10, pp. 2727-2739, (2017)
  • [13] Wang C K, Meng X F, Guo Q, Et al., Automating characterization deployment in distributed data stream management systems, IEEE Transactions on Knowledge and Data Engineering, 29, 12, pp. 2669-2681, (2017)
  • [14] Akbar A, Khan A, Carrez F, Et al., Predictive analytics for complex IoT data streams, IEEE Internet of Things Journal, 4, 5, pp. 1571-1582, (2017)
  • [15] Masud M M, Gao J, Khan L, Et al., A practical approach to classify evolving data streams: Training with limited amount of labeled data, The 8th IEEE International Conference on Data Mining, pp. 929-924, (2008)
  • [16] AI-Khateeb T, Masud M M, Ai-Naami K M, Et al., Recurring and novel class detection using class-based ensemble for evolving data stream, IEEE Transactions on Knowledge and Data Engineering, 28, 10, pp. 2752-2764, (2016)
  • [17] Hahsler M, Bolanos M., Clustering data streams based on shared density between micro-clusters, IEEE Transactions on Knowledge and Data Engineering, 28, 6, pp. 1449-1461, (2016)
  • [18] Fahy C, Yang S, Gongora M., Ant colony stream clustering: A fast density clustering algorithm for dynamic data streams, IEEE Transactions on Cybernetics, 49, 6, pp. 2215-2228, (2019)
  • [19] Morales G D F, Bifet A., SAMOA: Scalable advanced massive online analysis, Journal of Machine Learning Research, 16, 1, pp. 149-153, (2015)
  • [20] Basheer A, Sha K., Cluster-based quality-aware adaptive data compression for streaming data, Journal of Data and Information Quality, 9, 1, pp. 1-33, (2017)