RETRIEVE: Coreset Selection for Efficient and Robust Semi-Supervised Learning

被引:0
|
作者
Killamsetty, Krishnateja [1 ]
Zhao, Xujiang [1 ]
Chen, Feng [1 ]
Iyer, Rishabh [1 ]
机构
[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75083 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised learning (SSL) algorithms have had great success in recent years in limited labeled data regimes. However, the current state-of-the-art SSL algorithms are computationally expensive and entail significant compute time and energy requirements. This can prove to be a huge limitation for many smaller companies and academic groups. Our main insight is that training on a subset of unlabeled data instead of entire unlabeled data enables the current SSL algorithms to converge faster, significantly reducing computational costs. In this work, we propose RETRIEVE1, a coreset selection framework for efficient and robust semi-supervised learning. RETRIEVE selects the coreset by solving a mixed discrete-continuous bi-level optimization problem such that the selected coreset minimizes the labeled set loss. We use a one-step gradient approximation and show that the discrete optimization problem is approximately submodular, enabling simple greedy algorithms to obtain the coreset. We empirically demonstrate on several real-world datasets that existing SSL algorithms like VAT, Mean-Teacher, FixMatch, when used with RETRIEVE, achieve a) faster training times, b) better performance when unlabeled data consists of Out-of-Distribution (OOD) data and imbalance. More specifically, we show that with minimal accuracy degradation, RETRIEVE achieves a speedup of around 3x in the traditional SSL setting and achieves a speedup of 5x compared to state-of-the-art (SOTA) robust SSL algorithms in the case of imbalance and OOD data. RETRIEVE is available as a part of the CORDS toolkit: https://github.com/decile-team/cords.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Robust Semi-supervised Learning for Biometrics
    Yang, Nanhai
    Huang, Mingming
    He, Ran
    Wang, Xiukun
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, PT I, 2010, 6328 : 466 - 476
  • [2] Instance Selection in Semi-supervised Learning
    Guo, Yuanyuan
    Zhang, Harry
    Liu, Xiaobo
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 158 - 169
  • [3] Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection
    Benabdeslem, Khalid
    Elghazel, Haytham
    Hindawi, Mohammed
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) : 1161 - 1185
  • [4] Ensemble constrained Laplacian score for efficient and robust semi-supervised feature selection
    Khalid Benabdeslem
    Haytham Elghazel
    Mohammed Hindawi
    Knowledge and Information Systems, 2016, 49 : 1161 - 1185
  • [5] Robust semi-supervised learning in open environments
    Guo, Lan-Zhe
    Jia, Lin-Han
    Shao, Jie-Jing
    Li, Yu-Feng
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (08)
  • [6] Robust semi-supervised extreme learning machine
    Pei, Huimin
    Wang, Kuaini
    Lin, Qiang
    Zhong, Ping
    KNOWLEDGE-BASED SYSTEMS, 2018, 159 : 203 - 220
  • [7] Robust embedding regression for semi-supervised learning
    Bao, Jiaqi
    Kudo, Mineichi
    Kimura, Keigo
    Sun, Lu
    PATTERN RECOGNITION, 2024, 145
  • [8] SGL-RFS: SEMI-SUPERVISED GRAPH LEARNING ROBUST FEATURE SELECTION
    Zheng, Junjie
    Yuan, Haoliang
    Lai, Loi Lei
    Zheng, Houqing
    Wang, Zhimin
    Wang, Fenghua
    PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION (ICWAPR), 2018, : 155 - 160
  • [9] Efficient and Robust Semi-supervised Learning Over a Sparse-Regularized Graph
    Su, Hang
    Zhu, Jun
    Yin, Zhaozheng
    Dong, Yinpeng
    Zhang, Bo
    COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 583 - 598
  • [10] Efficient Semi-Supervised Learning and Sparse Structural Learning for Feature Selection of Leukemia Dataset
    Roopa, S. Nithya
    Nagarajan, N.
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (08) : 1815 - 1824