Data driven semi-supervised learning

被引：0

作者：

Balcan, Maria-Florina ^{[1
]}

Sharma, Dravyansh ^{[2
]}

机构：

[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA

[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations. We show how to leverage problem instances coming from an underlying problem domain to learn the graph hyperparameters for commonly used parametric families of graphs that provably perform well on new instances from the same domain. We obtain low regret and efficient algorithms in the online setting, and generalization guarantees in the distributional setting. We also show how to combine several very different similarity metrics and learn multiple hyperparameters, our results hold for large classes of problems. We expect some of the tools and techniques we develop along the way to be of independent interest, for data driven algorithms more generally.

引用

页数：13

共 50 条

[1] Data-driven semi-supervised and supervised learning algorithms for health monitoring of pipes
Sen, Debarshi
Aghazadeh, Amirali
Mousavi, Ali
Nagarajaiah, Satish
Baraniuk, Richard
Dabak, Anand
[J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2019, 131 : 524 - 537
[2] SEMI-SUPERVISED LEARNING WITH CO-TRAINING FOR DATA-DRIVEN PROGNOSTICS
Hu, Chao
Youn, Byeng D.
Kim, Taejin
[J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2011, VOL 2, PTS A AND B, 2012, : 1297 - 1306
[3] Semi-Supervised Learning with Data Augmentation for Tabular Data
Fang, Junpeng
Tang, Caizhi
Cui, Qing
Zhu, Feng
Li, Longfei
Zhou, Jun
Zhu, Wei
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3928 - 3932
[4] Incremental semi-supervised learning on streaming data
Li, Yanchao
Wang, Yongli
Liu, Qi
Bi, Cheng
Jiang, Xiaohui
Sun, Shurong
[J]. PATTERN RECOGNITION, 2019, 88 : 383 - 396
[5] A Semi-Supervised Learning Algorithm for Data Classification
Kuo, Cheng-Chien
Shieh, Horng-Lin
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
[6] Distributed Semi-Supervised Learning With Missing Data
Xu, Zhen
Liu, Ying
Li, Chunguang
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (12) : 6165 - 6178
[7] Data heterogeneity consideration in semi-supervised learning
Araujo, Bilza
Zhao, Liang
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 45 : 234 - 247
[8] Semi-supervised and Task-Driven Data Augmentation
Chaitanya, Krishna
Karani, Neerav
Baumgartner, Christian F.
Becker, Anton
Donati, Olivio
Konukoglu, Ender
[J]. INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2019, 2019, 11492 : 29 - 41
[9] On semi-supervised learning
A. Cholaquidis
R. Fraiman
M. Sued
[J]. TEST, 2020, 29 : 914 - 937
[10] On semi-supervised learning
Cholaquidis, A.
Fraiman, R.
Sued, M.
[J]. TEST, 2020, 29 (04) : 914 - 937

← 1 2 3 4 5 →