Repeat and learn: Self-supervised visual representations learning by Scene Localization

被引:0
|
作者
Altabrawee, Hussein [1 ,2 ]
Noor, Mohd Halim Mohd [1 ]
机构
[1] Univ Sains Malaysia, Sch Comp Sci, Main Campus, Gelugor 11800, Penang, Malaysia
[2] Al Muthanna Univ, Comp Ctr, Main Campus, Samawah 66001, Al Muthanna, Iraq
关键词
Visual representations learning; Action recognition; Self-supervised learning;
D O I
10.1016/j.patcog.2024.110804
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large labeled datasets are crucial for video understanding progress. However, the labeling process is timeconsuming, expensive, and tiresome. To overcome this impediment, various pretexts use the temporal coherence in videos to learn visual representations in a self-supervised manner. However, these pretexts (order verification and sequence sorting) struggle when encountering cyclic actions due to the label ambiguity problem. To overcome these limitations, we present a novel temporal pretext task to address self-supervised learning of visual representations from unlabeled videos. Repeated Scene Localization (RSL) is a multi-class classification pretext that involves changing the temporal order of the frames in a video by repeating a scene. Then, the network is trained to identify the modified video, localize the location of the repeated scene, and identify the unmodified original videos that do not have repeated scenes. We evaluated the proposed pretext on two benchmark datasets, UCF-101 and HMDB-51. The experimental results show that the proposed pretext achieves state-of-the-art results in action recognition and video retrieval tasks. In action recognition, our S3D model achieves 88.15% and 56.86% on UCF-101 and HMDB-51, respectively. It outperforms the current state-of-the-art by 1.05% and 3.26%. Our R(2+1)D-Adjacent model achieves 83.52% and 54.50% on UCF-101 and HMDB-51, respectively. It outperforms the single pretext tasks by 8.7% and 13.9%. In video retrieval, our R(2+1)D-Offset model outperforms the single pretext tasks by 4.68% and 1.1% Top 1 accuracies on UCF-101 and HMDB-51, respectively. The source code and the trained models are publicly available at https://github.com/Hussein-A-Hassan/RSL-Pretext.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Learning Action Representations for Self-supervised Visual Exploration
    Oh, Changjae
    Cavallaro, Andrea
    [J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 5873 - 5879
  • [2] Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
    Zhao, Yucheng
    Wang, Guangting
    Luo, Chong
    Zeng, Wenjun
    Zha, Zheng-Jun
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10140 - 10149
  • [3] Towards Efficient and Effective Self-supervised Learning of Visual Representations
    Addepalli, Sravanti
    Bhogale, Kaushal
    Dey, Priyam
    Babu, R. Venkatesh
    [J]. COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 523 - 538
  • [4] Self-Supervised and Invariant Representations for Wireless Localization
    Salihu, Artan
    Rupp, Markus
    Schwarz, Stefan
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (08) : 8281 - 8296
  • [5] Visual Reinforcement Learning With Self-Supervised 3D Representations
    Ze, Yanjie
    Hansen, Nicklas
    Chen, Yinbo
    Jain, Mohit
    Wang, Xiaolong
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (05) : 2890 - 2897
  • [6] Learning Self-supervised Audio-Visual Representations for Sound Recommendations
    Krishnamurthy, Sudha
    [J]. ADVANCES IN VISUAL COMPUTING (ISVC 2021), PT II, 2021, 13018 : 124 - 138
  • [7] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
    Turrisi da Costa, Victor G.
    Fini, Enrico
    Nabi, Moin
    Sebe, Nicu
    Ricci, Elisa
    [J]. Journal of Machine Learning Research, 2022, 23
  • [8] solo-learn: A Library of Self-supervised Methods for Visual Representation Learning
    Turrisi da Costa, Victor G.
    Fini, Enrico
    Nabi, Moin
    Sebe, Nicu
    Ricci, Elisa
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23 : 1 - 6
  • [9] Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding
    Stojnic, Vladan
    Risojevic, Vladimir
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1182 - 1191
  • [10] Self-Supervised Learning of Smart Contract Representations
    Yang, Shouliang
    Gu, Xiaodong
    Shen, Beijun
    [J]. 30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 82 - 93