Data heterogeneity consideration in semi-supervised learning

被引:8
|
作者
Araujo, Bilza [1 ,2 ]
Zhao, Liang [3 ]
机构
[1] Fed Univ Southern Bahia, Inst Humanities Arts & Sci, BR-45810000 Porto Seguro, BA, Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, Dept Comp Sci, BR-13560970 Sao Paulo, Brazil
[3] Univ Sao Paulo, Sch Philosophy Sci & Literature Ribeirao Preto, Dept Computat & Math, BR-14090901 Sao Paulo, Brazil
关键词
Semi-supervised learning; Graph construction; Complex networks; Representatives selection; Principal components analysis; DIMENSIONALITY REDUCTION; COMMUNITY STRUCTURE; GRAPH; CENTRALITY; INTERNET; SET;
D O I
10.1016/j.eswa.2015.09.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In class (cluster) formation process of machine learning techniques, data instances are usually assumed to have equal relevance. However, it is frequently not true. Such a situation is more typical in semi-supervised learning since we have to understand the data structure of both labeled and unlabeled data at the same time. In this paper, we investigate the organizational heterogeneity of data in semi-supervised learning using graph representation. This is because graph is a natural choice to characterize relationship between any pair of nodes or any pair of groups of nodes, consequently, strategical location of each node or each group of nodes can be determined by graph measures. Specifically, two issues are addressed: (1) We propose an adaptive graph construction method, we call AdaRadius, considering the heterogeneity of local interacting structure among nodes. As a result, it presents several interesting properties, namely adaptability to data density variations, low dependency on parameters setting, and reasonable computational cost, for both pool based and incremental data. (2) Moreover, we present heuristic criteria for selecting representative data samples to be labeled. Experimental study shows that selective labeling usually gets better classification results than random labeling. To our knowledge, it still lacks investigation on both issues up to now, therefore, our approach presents an important step toward the data heterogeneity characterization not only in semi-supervised learning, but also in general machine learning. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:234 / 247
页数:14
相关论文
共 50 条
  • [1] Data driven semi-supervised learning
    Balcan, Maria-Florina
    Sharma, Dravyansh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Navigating Data Heterogeneity in Federated Learning: A Semi-Supervised Federated Object Detection
    Kim, Taehyeon
    Lin, Eric
    Lee, Junu
    Lau, Christian
    Mugunthan, Vaikkunth
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Semi-Supervised Learning with Data Augmentation for Tabular Data
    Fang, Junpeng
    Tang, Caizhi
    Cui, Qing
    Zhu, Feng
    Li, Longfei
    Zhou, Jun
    Zhu, Wei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3928 - 3932
  • [4] Incremental semi-supervised learning on streaming data
    Li, Yanchao
    Wang, Yongli
    Liu, Qi
    Bi, Cheng
    Jiang, Xiaohui
    Sun, Shurong
    [J]. PATTERN RECOGNITION, 2019, 88 : 383 - 396
  • [5] A Semi-Supervised Learning Algorithm for Data Classification
    Kuo, Cheng-Chien
    Shieh, Horng-Lin
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
  • [6] Distributed Semi-Supervised Learning With Missing Data
    Xu, Zhen
    Liu, Ying
    Li, Chunguang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6165 - 6178
  • [7] On semi-supervised learning
    A. Cholaquidis
    R. Fraiman
    M. Sued
    [J]. TEST, 2020, 29 : 914 - 937
  • [8] On semi-supervised learning
    Cholaquidis, A.
    Fraiman, R.
    Sued, M.
    [J]. TEST, 2020, 29 (04) : 914 - 937
  • [9] Semi-supervised Learning
    Adams, Niall
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2009, 172 : 530 - 530
  • [10] Semi-supervised federated learning on evolving data streams
    Mawuli, Cobbinah B.
    Kumar, Jay
    Nanor, Ebenezer
    Fu, Shangxuan
    Pan, Liangxu
    Yang, Qinli
    Zhang, Wei
    Shao, Junming
    [J]. INFORMATION SCIENCES, 2023, 643