Long-Short Temporal Contrastive Learning of Video Transformers

被引：12

作者：

Wang, Jue ^{[1
]}

Bertasius, Gedas ^{[2
]}

Tran, Du ^{[1
]}

Torresani, Lorenzo ^{[1
,3
]}

机构：

[1] Facebook AI Res, Menlo Pk, CA 94025 USA

[2] Univ N Carolina, Chapel Hill, NC USA

[3] Dartmouth, Hanover, NH USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.01362

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video transformers have recently emerged as a competitive alternative to 3D CNNs for video understanding. However, due to their large number of parameters and reduced inductive biases, these models require supervised pretraining on large-scale image datasets to achieve top performance. In this paper, we empirically demonstrate that self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results that are on par or better than those obtained with supervised pretraining on large-scale image datasets, even massive ones such as ImageNet-21K Since transformer-based models are effective at capturing dependencies over extended temporal spans, we propose a simple learning procedure that forces the model to match a long-term view to a short-term view of the same video. Our approach, named Long-Short Temporal Contrastive Learning (LSTCL), enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent. To demonstrate the generality of our findings, we implement and validate our approach under three different self-supervised contrastive learning frameworks (MoCo v3, BYOL, SimSiam) using two distinct video-transformer architectures, including an improved variant of the Swin Transformer augmented with space-time attention. We conduct a thorough ablation study and show that LSTCL achieves competitive performance on multiple video benchmarks and represents a convincing alternative to supervised image-based pretraining.

引用

页码：13990 / 14000

页数：11

共 50 条

[21] Semantics Guided Contrastive Learning of Transformers for Zero-shot Temporal Activity Detection
Nag, Sayak
Goldstein, Orpaz
Roy-Chowdhury, Amit K.
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6232 - 6242
[22] Prediction of Turbulence Temporal Evolution in PANTA by Long-Short Term Memory Network
Aizawacaranza M.
Sasaki M.
Minagawa H.
Nakazawa Y.
Liu Y.
Jajima Y.
Kawachi Y.
Arakawa H.
Hara K.
Plasma and Fusion Research, 2022, 17 : 1201048 - 1
[23] Prediction of Turbulence Temporal Evolution in PANTA by Long-Short Term Memory Network
Aizawacaranza, Masaomi
Sasaki, Makoto
Minagawa, Hiroki
Nakazawa, Yuuki
Liu, Yoshitatsu
Jajima, Yuki
Kawachi, Yuichi
Arakawa, Hiroyuki
Hara, Kazuyuki
PLASMA AND FUSION RESEARCH, 2022, 17
[24] Group Activity Representation Learning With Long-Short States Predictive Transformer
Kong, Longteng
Zhou, Wanting
Pei, Duoxuan
He, Zhaofeng
Huang, Di
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7267 - 7281
[25] Long-short Distance Aggregation Networks for Positive Unlabeled Graph Learning
Wu, Man
Pan, Shirui
Du, Lan
Tsang, Ivor
Zhu, Xingquan
Du, Bo
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2157 - 2160
[26] LTCR: Long Temporal Characteristic Reconstruction for Segmentation in Contrastive Learning
He, Yang
Wu, Yuhan
Zhang, Junru
Dong, Yabo
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT V, ECML PKDD 2024, 2024, 14945 : 355 - 371
[27] LONG-SHORT ADDRESS OPTIMIZATION IN ASSEMBLERS
WILLIAMS, MH
SOFTWARE-PRACTICE & EXPERIENCE, 1979, 9 (03): : 227 - 235
[28] Practical Investment with the Long-Short Game
Al-baghdadi, Najim
Kalnishkan, Yuri
Lindsay, David
Lindsay, Sian
CONFORMAL AND PROBABILISTIC PREDICTION AND APPLICATIONS, VOL 128, 2020, 128 : 209 - 228
[29] The Long-Short Story of Movie Description
Rohrbach, Anna
Rohrbach, Marcus
Schiele, Bernt
PATTERN RECOGNITION, GCPR 2015, 2015, 9358 : 209 - 221
[30] Contrastive Learning with Bidirectional Transformers for Sequential Recommendation
Du, Hanwen
Shi, Hui
Zhao, Pengpeng
Wang, Deqing
Sheng, Victor S.
Liu, Yanchi
Liu, Guanfeng
Zhao, Lei
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 396 - 405

← 1 2 3 4 5 →