Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

被引:6
|
作者
Chen, Minghao [1 ]
Wei, Fangyun [2 ]
Li, Chong [2 ]
Cai, Deng [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci, State Key Lab CAD&CG, Hangzhou, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.01343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prior works on action representation learning mainly focus on designing various architectures to extract the global representations for short video clips. In contrast, many practical applications such as video alignment have strong demand for learning dense representations for long videos. In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a selfsupervised manner. Concretely, we introduce a simple yet efficient video encoder that considers spatio-temporal context to extract frame-wise representations. Inspired by the recent progress of self-supervised learning, we present a novel sequence contrastive loss (SCL) applied on two correlated views obtained through a series of spatio-temporal data augmentations. SCL optimizes the embedding space by minimizing the KL-divergence between the sequence similarity of two augmented views and a prior Gaussian distribution of timestamp distance. Experiments on FineGym, PennAction and Pouring datasets show that our method outperforms previous state-of-the-art by a large margin for downstream fine-grained action classification. Surprisingly, although without training on paired videos, our approach also shows outstanding performance on video alignment and fine-grained frame retrieval tasks. Code and models are available at https:// github. com/ minghchen/CARL_code.
引用
收藏
页码:13791 / 13800
页数:10
相关论文
共 50 条
  • [21] Temporal Sequence Distillation: Towards Few-Frame Action Recognition in Videos
    Zhang, Zhaoyang
    Kuang, Zhanghui
    Luo, Ping
    Feng, Litong
    Zhang, Wei
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 257 - 264
  • [22] SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations
    Sedghamiz, Hooman
    Raval, Shivam
    Santus, Enrico
    Alhanai, Tuka
    Ghassemi, Mohammad
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3398 - 3403
  • [23] Contrastive learning for fair graph representations via counterfactual graph augmentation
    Li, Chengyu
    Cheng, Debo
    Zhang, Guixian
    Zhang, Shichao
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [24] Counterfactual Contrastive Learning: Robust Representations via Causal Image Synthesis
    Roschewitz, Melanie
    Ribeiro, Fabio de Sousa
    Xia, Tian
    Khara, Galvin
    Glocker, Ben
    DATA ENGINEERING IN MEDICAL IMAGING, DEMI 2024, 2025, 15265 : 22 - 32
  • [25] Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining
    Zhang, Qihang
    Peng, Zhenghao
    Zhou, Bolei
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 111 - 128
  • [26] Early-stage autism diagnosis using action videos and contrastive feature learning
    Asha Rani
    Pankaj Yadav
    Yashaswi Verma
    Multimedia Systems, 2023, 29 : 2603 - 2614
  • [27] Early-stage autism diagnosis using action videos and contrastive feature learning
    Rani, Asha
    Yadav, Pankaj
    Verma, Yashaswi
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2603 - 2614
  • [28] Contrastive Learning of Person-Independent Representations for Facial Action Unit Detection
    Li, Yong
    Shan, Shiguang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3212 - 3225
  • [29] Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion
    Wang, Huiru
    Li, Xiuhong
    Ren, Zenyu
    Wang, Min
    Ma, Chunming
    SENSORS, 2023, 23 (05)
  • [30] Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning
    Yuan, Haoqi
    Lu, Zongqing
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,