Unsupervised Learning of Visual Representations via Rotation and Future Frame Prediction for Video Retrieval

被引:2
|
作者
Kumar, Vidit [1 ]
Tripathi, Vikas [1 ]
Pant, Bhaskar [1 ]
机构
[1] Graph Era Deemed Univ, Dehra Dun, Uttarakhand, India
关键词
Content based search; Deep learning; Self-supervised learning; Unsupervised learning; Video retrieval;
D O I
10.1007/978-3-030-81462-5_61
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to rapid technological advancements, the growth of videos uploaded to the internet has increased exponentially. Most of these videos are free of semantic tags, which makes indexing and retrieval a challenging task, and requires much-needed effective content-based analysis techniques to deal with. On the other hand, supervised representation learning from large-scale labeled dataset demonstrated great success in the image domain. However, creating such a large scale labeled database for videos is expensive and time consuming. To this end, we propose an unsupervised visual representation learning framework, which aims to learn spatiotemporal features by exploiting two pretext tasks i.e. rotation prediction and future frame prediction. The performance of the learned features is analyzed by the nearest neighbor task (video retrieval). For this, we choose the UCF-101 dataset to experiment with. The experimental results shows the competitive performance achieve by our method.
引用
收藏
页码:701 / 710
页数:10
相关论文
共 50 条
  • [1] Video Frame Prediction via Deep Learning
    Yilmaz, M. Akin
    Tekalp, A. Murat
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [2] Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval
    Kumar, Vidit
    Tripathi, Vikas
    Pant, Bhaskar
    [J]. INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2022, 7 (02) : 272 - 287
  • [3] Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval
    Kumar, Vidit
    Tripathi, Vikas
    Pant, Bhaskar
    [J]. International Journal of Mathematical, Engineering and Management Sciences, 2022, 7 (02): : 272 - 287
  • [4] Unsupervised Learning of Dense Visual Representations
    Pinheiro, Pedro O.
    Almahairi, Amjad
    Benmalek, Ryan Y.
    Golemo, Florian
    Courville, Aaron
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Face Video Retrieval via Deep Learning of Binary Hash Representations
    Dong, Zhen
    Jia, Su
    Wu, Tianfu
    Pei, Mingtao
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3471 - 3477
  • [6] Unsupervised Learning of Video Representations using LSTMs
    Srivastava, Nitish
    Mansimov, Elman
    Salakhutdinov, Ruslan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 843 - 852
  • [7] Unsupervised Learning of Disentangled Representations from Video
    Denton, Emily
    Birodkar, Vighnesh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [8] Unsupervised Manifold Learning for Video Genre Retrieval
    Almeida, Jurandy
    Pedronette, Daniel C. G.
    Penatti, Otavio A. B.
    [J]. PROGRESS IN PATTERN RECOGNITION IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2014, 2014, 8827 : 604 - 612
  • [9] Patch to the Future: Unsupervised Visual Prediction
    Walker, Jacob
    Gupta, Abhinav
    Hebert, Martial
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3302 - 3309
  • [10] Unsupervised Learning of Visual Representations using Videos
    Wang, Xiaolong
    Gupta, Abhinav
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2794 - 2802