Data driven semi-supervised learning

被引：0

作者：

Balcan, Maria-Florina ^{[1
]}

Sharma, Dravyansh ^{[2
]}

机构：

[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA

[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations. We show how to leverage problem instances coming from an underlying problem domain to learn the graph hyperparameters for commonly used parametric families of graphs that provably perform well on new instances from the same domain. We obtain low regret and efficient algorithms in the online setting, and generalization guarantees in the distributional setting. We also show how to combine several very different similarity metrics and learn multiple hyperparameters, our results hold for large classes of problems. We expect some of the tools and techniques we develop along the way to be of independent interest, for data driven algorithms more generally.

引用

页数：13

共 50 条

[21] Maximum margin semi-supervised learning with irrelevant data
Yang, Haiqin
Huang, Kaizhu
King, Irwin
Lyu, Michael R.
[J]. NEURAL NETWORKS, 2015, 70 : 90 - 102
[22] AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data
Banitalebi-Dehkordi, Amin
Gujjar, Pratik
Zhang, Yong
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3998 - 4005
[23] COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION
Breve, Fabricio Aparecido
Guimaraes Pedronette, Daniel Carlos
[J]. 2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
[24] Uncertainty Aware Semi-Supervised Learning on Graph Data
Zhao, Xujiang
Chen, Feng
Hu, Shu
Cho, Jin-Hee
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[25] Learning Semi-Supervised Representation Towards a Unified Optimization Framework for Semi-Supervised Learning
Li, Chun-Guang
Lin, Zhouchen
Zhang, Honggang
Guo, Jun
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2767 - 2775
[26] Data-driven semi-supervised clustering for oil prediction
Boesen, Tue
Haber, Eldad
Hoversten, G. Michael
[J]. COMPUTERS & GEOSCIENCES, 2021, 148
[27] Semi-supervised learning by disagreement
Zhou, Zhi-Hua
Li, Ming
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 24 (03) : 415 - 439
[28] Semi-supervised Sequence Learning
Dai, Andrew M.
Le, Quoc V.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[29] A survey on semi-supervised learning
Jesper E. van Engelen
Holger H. Hoos
[J]. Machine Learning, 2020, 109 : 373 - 440
[30] Semi-supervised learning by disagreement
Zhi-Hua Zhou
Ming Li
[J]. Knowledge and Information Systems, 2010, 24 : 415 - 439

← 1 2 3 4 5 →