Data driven semi-supervised learning

被引:0
|
作者
Balcan, Maria-Florina [1 ]
Sharma, Dravyansh [2 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations. We show how to leverage problem instances coming from an underlying problem domain to learn the graph hyperparameters for commonly used parametric families of graphs that provably perform well on new instances from the same domain. We obtain low regret and efficient algorithms in the online setting, and generalization guarantees in the distributional setting. We also show how to combine several very different similarity metrics and learn multiple hyperparameters, our results hold for large classes of problems. We expect some of the tools and techniques we develop along the way to be of independent interest, for data driven algorithms more generally.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Data-driven semi-supervised and supervised learning algorithms for health monitoring of pipes
    Sen, Debarshi
    Aghazadeh, Amirali
    Mousavi, Ali
    Nagarajaiah, Satish
    Baraniuk, Richard
    Dabak, Anand
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2019, 131 : 524 - 537
  • [2] SEMI-SUPERVISED LEARNING WITH CO-TRAINING FOR DATA-DRIVEN PROGNOSTICS
    Hu, Chao
    Youn, Byeng D.
    Kim, Taejin
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2011, VOL 2, PTS A AND B, 2012, : 1297 - 1306
  • [3] Semi-Supervised Learning with Data Augmentation for Tabular Data
    Fang, Junpeng
    Tang, Caizhi
    Cui, Qing
    Zhu, Feng
    Li, Longfei
    Zhou, Jun
    Zhu, Wei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3928 - 3932
  • [4] Incremental semi-supervised learning on streaming data
    Li, Yanchao
    Wang, Yongli
    Liu, Qi
    Bi, Cheng
    Jiang, Xiaohui
    Sun, Shurong
    [J]. PATTERN RECOGNITION, 2019, 88 : 383 - 396
  • [5] A Semi-Supervised Learning Algorithm for Data Classification
    Kuo, Cheng-Chien
    Shieh, Horng-Lin
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
  • [6] Distributed Semi-Supervised Learning With Missing Data
    Xu, Zhen
    Liu, Ying
    Li, Chunguang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6165 - 6178
  • [7] Data heterogeneity consideration in semi-supervised learning
    Araujo, Bilza
    Zhao, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 45 : 234 - 247
  • [8] Semi-supervised and Task-Driven Data Augmentation
    Chaitanya, Krishna
    Karani, Neerav
    Baumgartner, Christian F.
    Becker, Anton
    Donati, Olivio
    Konukoglu, Ender
    [J]. INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2019, 2019, 11492 : 29 - 41
  • [9] On semi-supervised learning
    A. Cholaquidis
    R. Fraiman
    M. Sued
    [J]. TEST, 2020, 29 : 914 - 937
  • [10] On semi-supervised learning
    Cholaquidis, A.
    Fraiman, R.
    Sued, M.
    [J]. TEST, 2020, 29 (04) : 914 - 937