Boosting Representation Learning via Similarity-based Active Data Sampling from the Web

被引:0
|
作者
Ueno, Shiryu [1 ]
Kato, Kunihito [2 ]
机构
[1] Gifu Univ, Dept Nat Sci & Technol, Gifu, Japan
[2] Gifu Univ, Fac Engn, Gifu, Japan
关键词
Representation Learning; Self-Supervised Learning; Deep Active Learning;
D O I
10.1109/IWIS62722.2024.10706063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training deep learning models requires a huge dataset, and human annotation is necessary to construct dataset. However, the larger the dataset, the greater the annotation cost. Self-Supervised Learning (SSL), a form of representation learning, addresses this problem by using mechanically generated pseudo-labels for training. SSL eliminates the need for human annotation, thereby reducing the costs associated with dataset construction and facilitating the training on large-scale dataset, compared to supervised learning. However, when it is challenging to collect a large number of images due to varying capture conditions, the dataset size diminishes, and SSL performance degrades. This study proposes a method that combines image similarity search with a diversity-based query strategy for Deep Active Learning, selectively augmenting the training dataset with images collected from the web. Our approach enables the collection of images highly relevant to downstream tasks via similarity search, while excluding images that are too similar to those in the training dataset, thereby maintaining dataset diversity with a diversity-based query strategy. Experiments demonstrate that our method enhances SSL performance, particularly when the training dataset for downstream tasks is limited.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] ServiceXplorer: A Similarity-based Web Service Search Engine
    Ngu, Anne H. H.
    Ma, Jiangang
    Sheng, Quan Z.
    Yao, Lina
    Julian, Scott
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 1251 - 1252
  • [32] Singing Voice Detection via Similarity-based Semi-supervised Learning
    Chen, Xi
    Gao, Yongwei
    Li, Wei
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [33] Dynamic simulation of gas turbines via feature similarity-based transfer learning
    Zhou, Dengji
    Hao, Jiarui
    Huang, Dawen
    Jia, Xingyun
    Zhang, Huisheng
    FRONTIERS IN ENERGY, 2020, 14 (04) : 817 - 835
  • [34] Recursive Similarity-Based Algorithm for Deep Learning
    Maszczyk, Tomasz
    Duch, Wlodzislaw
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT III, 2012, 7665 : 390 - 397
  • [35] A similarity-based learning approach for adaptive negotiations
    Lau, RYK
    MLMTA'03: INTERNATIONAL CONFERENCE ON MACHINE LEARNING; MODELS, TECHNOLOGIES AND APPLICATIONS, 2003, : 281 - 287
  • [36] Similarity-based Multi-label Learning
    Rossi, Ryan A.
    Ahmed, Nesreen K.
    Eldardiry, Hoda
    Zhou, Rong
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [37] Similarity-based transfer learning of decision policies
    Zugarova, Eliska
    Guy, Tatiana, V
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 37 - 44
  • [38] Feature deforming for improved similarity-based learning
    Petridis, S
    Perantonis, SJ
    METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 201 - 209
  • [39] Extensible and similarity-based grouping for data integration
    Schallehn, E
    Sattler, KU
    Saake, G
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 277 - 277
  • [40] Efficient similarity-based operations for data integration
    Schallehn, E
    Sattler, KU
    Saake, G
    DATA & KNOWLEDGE ENGINEERING, 2004, 48 (03) : 361 - 387