Learning High-Dimensional Evolving Data Streams With Limited Labels

被引:5
|
作者
Din, Salah Ud [1 ,2 ,3 ]
Kumar, Jay [1 ,2 ]
Shao, Junming [1 ,2 ]
Mawuli, Cobbinah Bernard [1 ,2 ]
Ndiaye, Waldiodio David [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China, Yangtze Delta Region Inst Huzhou, Huzhou 313001, Peoples R China
[3] COMSATS Univ Islamabad, Dept Comp Sci, Islamabad 45550, Pakistan
基金
中国国家自然科学基金;
关键词
Clustering algorithms; Heuristic algorithms; Feature extraction; Classification algorithms; Data models; Data mining; Noise reduction; Concept drift; denoising autoencoder (DAE); evolving data streams; semisupervised learning (SSL); synchronization; CONCEPT DRIFT; CLASSIFICATION; AUTOENCODERS; ENSEMBLE;
D O I
10.1109/TCYB.2021.3070420
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of streaming data, learning algorithms often need to confront several unique challenges, such as concept drift, label scarcity, and high dimensionality. Several concept drift-aware data stream learning algorithms have been proposed to tackle these issues over the past decades. However, most existing algorithms utilize a supervised learning framework and require all true class labels to update their models. Unfortunately, in the streaming environment, requiring all labels is unfeasible and not realistic in many real-world applications. Therefore, learning data streams with minimal labels is a more practical scenario. Considering the problem of the curse of dimensionality and label scarcity, in this article, we present a new semisupervised learning technique for streaming data. To cure the curse of dimensionality, we employ a denoising autoencoder to transform the high-dimensional feature space into a reduced, compact, and more informative feature representation. Furthermore, we use a cluster-and-label technique to reduce the dependency on true class labels. We employ a synchronization-based dynamic clustering technique to summarize the streaming data into a set of dynamic microclusters that are further used for classification. In addition, we employ a disagreement-based learning method to cope with concept drift. Extensive experiments performed on many real-world datasets demonstrate the superior performance of the proposed method compared to several state-of-the-art methods.
引用
收藏
页码:11373 / 11384
页数:12
相关论文
共 50 条
  • [1] A reliable adaptive prototype-based learning for evolving data streams with limited labels
    Din, Salah Ud
    Ullah, Aman
    Mawuli, Cobbinah B.
    Yang, Qinli
    Shao, Junming
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [2] Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams
    Sui, Jinping
    Liu, Zhen
    Liu, Li
    Jung, Alexander
    Li, Xiang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4173 - 4186
  • [3] Evolving insight into high-dimensional data
    Tu, YQ
    Li, G
    Dai, H
    [J]. ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 465 - 474
  • [4] Classification of high-dimensional evolving data streams via a resource-efficient online ensemble
    Zhai, Tingting
    Gao, Yang
    Wang, Hao
    Cao, Longbing
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (05) : 1242 - 1265
  • [5] Classification of high-dimensional evolving data streams via a resource-efficient online ensemble
    Tingting Zhai
    Yang Gao
    Hao Wang
    Longbing Cao
    [J]. Data Mining and Knowledge Discovery, 2017, 31 : 1242 - 1265
  • [6] Learning high-dimensional data
    Verleysen, M
    [J]. LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162
  • [7] Learning evolving prototypes for imbalanced data stream classification with limited labels
    Wu, Zhonglin
    Wang, Hongliang
    Guo, Jingxia
    Yang, Qinli
    Shao, Junming
    [J]. INFORMATION SCIENCES, 2024, 679
  • [8] Detecting Projected Outliers in High-Dimensional Data Streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    Liu, Qing
    Xu, Kai
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
  • [9] Online Pattern Mining for High-Dimensional Data Streams
    Yamamoto, Yoshitaka
    Iwanuma, Koji
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2880 - 2882
  • [10] Generalized projected clustering in high-dimensional data streams
    Wang, T
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778