Boosting Representation Learning via Similarity-based Active Data Sampling from the Web

被引:0
|
作者
Ueno, Shiryu [1 ]
Kato, Kunihito [2 ]
机构
[1] Gifu Univ, Dept Nat Sci & Technol, Gifu, Japan
[2] Gifu Univ, Fac Engn, Gifu, Japan
关键词
Representation Learning; Self-Supervised Learning; Deep Active Learning;
D O I
10.1109/IWIS62722.2024.10706063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training deep learning models requires a huge dataset, and human annotation is necessary to construct dataset. However, the larger the dataset, the greater the annotation cost. Self-Supervised Learning (SSL), a form of representation learning, addresses this problem by using mechanically generated pseudo-labels for training. SSL eliminates the need for human annotation, thereby reducing the costs associated with dataset construction and facilitating the training on large-scale dataset, compared to supervised learning. However, when it is challenging to collect a large number of images due to varying capture conditions, the dataset size diminishes, and SSL performance degrades. This study proposes a method that combines image similarity search with a diversity-based query strategy for Deep Active Learning, selectively augmenting the training dataset with images collected from the web. Our approach enables the collection of images highly relevant to downstream tasks via similarity search, while excluding images that are too similar to those in the training dataset, thereby maintaining dataset diversity with a diversity-based query strategy. Experiments demonstrate that our method enhances SSL performance, particularly when the training dataset for downstream tasks is limited.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [42] Similarity-Based Compression of GPS Trajectory Data
    Birnbaum, Jeremy
    Meng, Hsiang-Cheng
    Hwang, Jeong-Hyon
    Lawson, Catherine
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING FOR GEOSPATIAL RESEARCH AND APPLICATION (COM.GEO), 2013, : 92 - 95
  • [43] A method for similarity-based grouping of biological data
    Jakoniene, Vaida
    Rundqvist, David
    Lambrix, Patrick
    DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2006, 4075 : 136 - 151
  • [44] On similarity-based queries for time series data
    Rafiei, D
    15TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1999, : 410 - 417
  • [45] Similarity-based model for ordered categorical data
    Gayer, Gabi
    Lieberman, Offer
    Yaffe, Omer
    ECONOMETRIC REVIEWS, 2019, 38 (03) : 263 - 278
  • [46] Draining the Data Swamp: A Similarity-based Approach
    Brackenbury, Will
    Liu, Rui
    Mondal, Mainack
    Elmore, Aaron J.
    Ur, Blase
    Chard, Kyle
    Franklin, Michael J.
    HILDA'18: PROCEEDINGS OF THE WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, 2018,
  • [47] Entity Similarity-Based Negative Sampling for Knowledge Graph Embedding
    Yao, Naimeng
    Liu, Qing
    Li, Xiang
    Yang, Yi
    Bai, Quan
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT II, 2022, 13630 : 73 - 87
  • [48] Similarity-Based Processing of Motion Capture Data
    Sedmidubsky, Jan
    Zezula, Pavel
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2087 - 2089
  • [49] A similarity-based approach for data stream classification
    Mena-Torres, Dayrelis
    Aguilar-Ruiz, Jesus S.
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (09) : 4224 - 4234
  • [50] Similarity-Based Correlation Functions for Binary Data
    Batyrshin, Ildar Z.
    Ramirez-Mejia, Ivan
    Batyrshin, Ilnur I.
    Solovyev, Valery
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 224 - 233