Boosting Representation Learning via Similarity-based Active Data Sampling from the Web

被引:0
|
作者
Ueno, Shiryu [1 ]
Kato, Kunihito [2 ]
机构
[1] Gifu Univ, Dept Nat Sci & Technol, Gifu, Japan
[2] Gifu Univ, Fac Engn, Gifu, Japan
关键词
Representation Learning; Self-Supervised Learning; Deep Active Learning;
D O I
10.1109/IWIS62722.2024.10706063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training deep learning models requires a huge dataset, and human annotation is necessary to construct dataset. However, the larger the dataset, the greater the annotation cost. Self-Supervised Learning (SSL), a form of representation learning, addresses this problem by using mechanically generated pseudo-labels for training. SSL eliminates the need for human annotation, thereby reducing the costs associated with dataset construction and facilitating the training on large-scale dataset, compared to supervised learning. However, when it is challenging to collect a large number of images due to varying capture conditions, the dataset size diminishes, and SSL performance degrades. This study proposes a method that combines image similarity search with a diversity-based query strategy for Deep Active Learning, selectively augmenting the training dataset with images collected from the web. Our approach enables the collection of images highly relevant to downstream tasks via similarity search, while excluding images that are too similar to those in the training dataset, thereby maintaining dataset diversity with a diversity-based query strategy. Experiments demonstrate that our method enhances SSL performance, particularly when the training dataset for downstream tasks is limited.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Similarity-Based Chained Transfer Learning for Energy Forecasting With Big Data
    Tian, Yifang
    Sehovac, Ljubisa
    Grolinger, Katarina
    IEEE ACCESS, 2019, 7 : 139895 - 139908
  • [22] Similarity-based data reduction and classification
    Guo, GD
    Wang, H
    Bell, D
    Liao, ZN
    Monitoring, Security, and Rescue Techniques in Multiagent Systems, 2005, : 227 - 238
  • [23] User segmentation via interpretable user representation and relative similarity-based segmentation method
    Lee, Younghoon
    Cho, Sungzoon
    MULTIMEDIA SYSTEMS, 2021, 27 (01) : 61 - 72
  • [24] Deep Similarity-Based Batch Mode Active Learning with Exploration-Exploitation
    Yin, Changchang
    Qian, Buyue
    Cao, Shilei
    Li, Xiaoyu
    Wei, Jishang
    Zheng, Qinghua
    Davidson, Ian
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 575 - 584
  • [25] User segmentation via interpretable user representation and relative similarity-based segmentation method
    Younghoon Lee
    Sungzoon Cho
    Multimedia Systems, 2021, 27 : 61 - 72
  • [26] Similarity-based soft clustering algorithm for web documents
    School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430079, China
    Jisuanji Gongcheng, 2006, 2 (59-61):
  • [27] Semantic Similarity-Based Web Services Access Control
    Zhao, Yi
    Wang, Xia
    AUTONOMOUS SYSTEMS: DEVELOPMENTS AND TRENDS, 2011, 391 : 339 - +
  • [28] Ranking approaches for similarity-based web element location☆
    Coppola, Riccardo
    Feldt, Robert
    Nass, Michel
    Alegroth, Emil
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 222
  • [29] Research on similarity-based semantic web services discovery
    Chen W.-Y.
    Zhang Z.-Q.
    Xiang T.
    Sang N.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2010, 39 (06): : 896 - 899+910
  • [30] Dynamic simulation of gas turbines via feature similarity-based transfer learning
    Dengji Zhou
    Jiarui Hao
    Dawen Huang
    Xingyun Jia
    Huisheng Zhang
    Frontiers in Energy, 2020, 14 : 817 - 835