Data driven semi-supervised learning

被引:0
|
作者
Balcan, Maria-Florina [1 ]
Sharma, Dravyansh [2 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Carnegie Mellon Univ, Dept Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a novel data driven approach for designing semi-supervised learning algorithms that can effectively learn with only a small number of labeled examples. We focus on graph-based techniques, where the unlabeled examples are connected in a graph under the implicit assumption that similar nodes likely have similar labels. Over the past two decades, several elegant graph-based semi-supervised learning algorithms for inferring the labels of the unlabeled examples given the graph and a few labeled examples have been proposed. However, the problem of how to create the graph (which impacts the practical usefulness of these methods significantly) has been relegated to heuristics and domain-specific art, and no general principles have been proposed. In this work we present a novel data driven approach for learning the graph and provide strong formal guarantees in both the distributional and online learning formalizations. We show how to leverage problem instances coming from an underlying problem domain to learn the graph hyperparameters for commonly used parametric families of graphs that provably perform well on new instances from the same domain. We obtain low regret and efficient algorithms in the online setting, and generalization guarantees in the distributional setting. We also show how to combine several very different similarity metrics and learn multiple hyperparameters, our results hold for large classes of problems. We expect some of the tools and techniques we develop along the way to be of independent interest, for data driven algorithms more generally.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Maximum margin semi-supervised learning with irrelevant data
    Yang, Haiqin
    Huang, Kaizhu
    King, Irwin
    Lyu, Michael R.
    [J]. NEURAL NETWORKS, 2015, 70 : 90 - 102
  • [22] AuxMix: Semi-Supervised Learning with Unconstrained Unlabeled Data
    Banitalebi-Dehkordi, Amin
    Gujjar, Pratik
    Zhang, Yong
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3998 - 4005
  • [23] COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION
    Breve, Fabricio Aparecido
    Guimaraes Pedronette, Daniel Carlos
    [J]. 2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [24] Uncertainty Aware Semi-Supervised Learning on Graph Data
    Zhao, Xujiang
    Chen, Feng
    Hu, Shu
    Cho, Jin-Hee
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [25] Learning Semi-Supervised Representation Towards a Unified Optimization Framework for Semi-Supervised Learning
    Li, Chun-Guang
    Lin, Zhouchen
    Zhang, Honggang
    Guo, Jun
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2767 - 2775
  • [26] Data-driven semi-supervised clustering for oil prediction
    Boesen, Tue
    Haber, Eldad
    Hoversten, G. Michael
    [J]. COMPUTERS & GEOSCIENCES, 2021, 148
  • [27] Semi-supervised learning by disagreement
    Zhou, Zhi-Hua
    Li, Ming
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 24 (03) : 415 - 439
  • [28] Semi-supervised Sequence Learning
    Dai, Andrew M.
    Le, Quoc V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [29] A survey on semi-supervised learning
    Jesper E. van Engelen
    Holger H. Hoos
    [J]. Machine Learning, 2020, 109 : 373 - 440
  • [30] Semi-supervised learning by disagreement
    Zhi-Hua Zhou
    Ming Li
    [J]. Knowledge and Information Systems, 2010, 24 : 415 - 439