Multi-view and multi-augmentation for self-supervised visual representation learning

被引:2
|
作者
Tran, Van Nhiem [1 ]
Huang, Chi-En [2 ]
Liu, Shen-Hsuan [2 ]
Aslam, Muhammad Saqlain [2 ]
Yang, Kai-Lin [2 ]
Li, Yung-Hui [2 ]
Wang, Jia-Ching [1 ]
机构
[1] Natl Cent Univ, Dept Comp Sci & Informat Engn, Taoyuan 32001, Taiwan
[2] Hon Hai Res Inst, AI Res Ctr, Taipei 114699, Taiwan
关键词
Multi-augmentation; SSL augmentation pipelines; Data augmentation policies; Nuisance factors; Scale-invariant representation learning; Metric learning;
D O I
10.1007/s10489-023-05163-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the real world, the appearance of identical objects depends on factors as varied as resolution, angle, illumination conditions, and viewing perspectives. This suggests that the data augmentation pipeline could benefit downstream tasks by exploring the overall data appearance in a self-supervised framework. Previous work on self-supervised learning that yields outstanding performance relies heavily on data augmentation such as cropping and color distortion. However, most methods use a static data augmentation pipeline, limiting the amount of feature exploration. To generate representations that encompass scale-invariant, explicit information about various semantic features and are invariant to nuisance factors such as relative object location, brightness, and color distortion, we propose the Multi-View, Multi-Augmentation (MVMA) framework. MVMA consists of multiple augmentation pipelines, with each pipeline comprising an assortment of augmentation policies. By refining the baseline self-supervised framework to investigate a broader range of image appearances through modified loss objective functions, MVMA enhances the exploration of image features through diverse data augmentation techniques. Transferring the resultant representation learning using convolutional networks (ConvNets) to downstream tasks yields significant improvements compared to the state-of-the-art DINO across a wide range of vision tasks and classification tasks: +4.1% and +8.8% top-1 on the ImageNet dataset with linear evaluation and k-NN classifier, respectively. Moreover, MVMA achieves a significant improvement of +5% AP(50) and +7% AP(50)(m) on COCO object detection and segmentation.
引用
收藏
页码:629 / 656
页数:28
相关论文
共 50 条
  • [1] Multi-view and multi-augmentation for self-supervised visual representation learning
    Van Nhiem Tran
    Chi-En Huang
    Shen-Hsuan Liu
    Muhammad Saqlain Aslam
    Kai-Lin Yang
    Yung-Hui Li
    Jia-Ching Wang
    [J]. Applied Intelligence, 2024, 54 : 629 - 656
  • [2] MULTI-AUGMENTATION FOR EFFICIENT SELF-SUPERVISED VISUAL REPRESENTATION LEARNING
    Tran, Van Nhiem
    Huang, Chi-En
    Liu, Shen-Hsuan
    Yang, Kai-Lin
    Ko, Timothy
    Li, Yung-Hui
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [3] Self-supervised learning for multi-view stereo
    Ito S.
    Kaneko N.
    Sumi K.
    [J]. Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2020, 86 (12): : 1042 - 1050
  • [4] CoSleep: A Multi-View Representation Learning Framework for Self-Supervised Learning of Sleep Stage Classification
    Ye, Jianan
    Xiao, Qinfeng
    Wang, Jing
    Zhang, Hongjun
    Deng, Jiaoxue
    Lin, Youfang
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 189 - 193
  • [5] Generation-based Multi-view Contrast for Self-supervised Graph Representation Learning
    Han, Yuehui
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (05)
  • [6] Self-supervised Multi-view Stereo via View Synthesis Representation Consistency
    Zhang, Hang
    Cao, Jie
    Wu, Xingming
    Liu, Zhong
    Chen, Weihai
    [J]. 2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 876 - 881
  • [7] Self-supervised Learning of Depth Inference for Multi-view Stereo
    Yang, Jiayu
    Alvarez, Jose M.
    Liu, Miaomiao
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 7522 - 7530
  • [8] MVEB: Self-Supervised Learning With Multi-View Entropy Bottleneck
    Wen, Liangjian
    Wang, Xiasi
    Liu, Jianzhuang
    Xu, Zenglin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6097 - 6108
  • [9] Self-Supervised Discriminative Feature Learning for Deep Multi-View Clustering
    Xu, Jie
    Ren, Yazhou
    Tang, Huayi
    Yang, Zhimeng
    Pan, Lili
    Yang, Yang
    Pu, Xiaorong
    Yu, Philip S.
    He, Lifang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (07) : 7470 - 7482
  • [10] Self Supervised Multi-view Graph Representation Learning in Digital Pathology
    Ramanathan, Vishwesh
    Martel, Anne L.
    [J]. GRAPHS IN BIOMEDICAL IMAGE ANALYSIS, AND OVERLAPPED CELL ON TISSUE DATASET FOR HISTOPATHOLOGY, 5TH MICCAI WORKSHOP, 2024, 14373 : 74 - 84