Boosting Representation Learning via Similarity-based Active Data Sampling from the Web

被引:0
|
作者
Ueno, Shiryu [1 ]
Kato, Kunihito [2 ]
机构
[1] Gifu Univ, Dept Nat Sci & Technol, Gifu, Japan
[2] Gifu Univ, Fac Engn, Gifu, Japan
关键词
Representation Learning; Self-Supervised Learning; Deep Active Learning;
D O I
10.1109/IWIS62722.2024.10706063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training deep learning models requires a huge dataset, and human annotation is necessary to construct dataset. However, the larger the dataset, the greater the annotation cost. Self-Supervised Learning (SSL), a form of representation learning, addresses this problem by using mechanically generated pseudo-labels for training. SSL eliminates the need for human annotation, thereby reducing the costs associated with dataset construction and facilitating the training on large-scale dataset, compared to supervised learning. However, when it is challenging to collect a large number of images due to varying capture conditions, the dataset size diminishes, and SSL performance degrades. This study proposes a method that combines image similarity search with a diversity-based query strategy for Deep Active Learning, selectively augmenting the training dataset with images collected from the web. Our approach enables the collection of images highly relevant to downstream tasks via similarity search, while excluding images that are too similar to those in the training dataset, thereby maintaining dataset diversity with a diversity-based query strategy. Experiments demonstrate that our method enhances SSL performance, particularly when the training dataset for downstream tasks is limited.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Similarity-based active learning methods
    Sui, Qun
    Ghosh, Sujit K.
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [2] Knowledge Graph Representation via Similarity-Based Embedding
    Tan, Zhen
    Zhao, Xiang
    Fang, Yang
    Ge, Bin
    Xiao, Weidong
    SCIENTIFIC PROGRAMMING, 2018, 2018
  • [3] Robust Similarity-Based Concept Factorization for Data Representation
    Shen, Xingyu
    Zhang, Xiang
    Lan, Long
    Liao, Qing
    Luo, Zhigang
    IEEE ACCESS, 2020, 8 : 81394 - 81411
  • [4] Federated similarity-based learning with incomplete data
    Pekala, Barbara
    Szkola, Jaroslaw
    Dyczkowski, Krzysztof
    Wilbik, Anna
    2023 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ, 2023,
  • [5] SIMILARITY-BASED IMAGE CLASSIFICATION VIA KERNELIZED SPARSE REPRESENTATION
    Zeng, Zhi
    Li, Heping
    Liang, Wei
    Zhang, Shuwu
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 277 - 280
  • [6] Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
    Dadashzadeh, Amirhossein
    Whone, Alan
    Mirmehdi, Majid
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4230 - 4239
  • [7] Multiplicative Noise Removal via Nonlocal Similarity-Based Sparse Representation
    Chen, Lixia
    Liu, Xujiao
    Wang, Xuewen
    Zhu, Pingfang
    JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2016, 54 (02) : 199 - 215
  • [8] Multiplicative Noise Removal via Nonlocal Similarity-Based Sparse Representation
    Lixia Chen
    Xujiao Liu
    Xuewen Wang
    Pingfang Zhu
    Journal of Mathematical Imaging and Vision, 2016, 54 : 199 - 215
  • [9] Similarity-based Web service matchmaking
    Wu, J
    Wu, ZH
    2005 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, VOL 1, PROCEEDINGS, 2005, : 287 - 294
  • [10] Similarity-based Web Browser Optimization
    Wang, Haoyu
    Liu, Mengxin
    Guo, Yao
    Chen, Xiangqun
    WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2014, : 575 - 584