TubeR: Tubelet Transformer for Video Action Detection

被引：32

作者：

Zhao, Jiaojiao ^{[1
]}

Zhang, Yanyi ^{[2
]}

Li, Xinyu ^{[3
]}

Chen, Hao ^{[3
]}

Shuai, Bing ^{[3
]}

Xu, Mingze ^{[3
]}

Liu, Chunhui ^{[3
]}

Kundu, Kaustav ^{[3
]}

Xiong, Yuanjun ^{[3
]}

Modolo, Davide ^{[3
]}

Marsic, Ivan ^{[2
]}

Snoek, Cees G. M. ^{[1
]}

Tighe, Joseph ^{[3
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

[2] Rutgers State Univ, New Brunswick, NJ USA

[3] AWS AI Labs, Palo Alto, CA USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.01323

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns a set of tubelet-queries and utilizes a tubelet-attention module to model the dynamic spatio-temporal nature of a video clip, which effectively reinforces the model capacity compared to using actor-positional hypotheses in the spatio-temporal space. For videos containing transitional states or scene changes, we propose a context aware classification head to utilize short-term and long-term context to strengthen action classification, and an action switch regression head for detecting the precise temporal action extent. TubeR directly produces action tubelets with variable lengths and even maintains good results for long video clips. TubeR outperforms the previous state-of-the-art on commonly used action detection datasets AVA, UCF101-24 and JHMDB51-21. Code will be available on GluonCV(https://cv.gluon.ai/).

引用

页码：13588 / 13597

页数：10

共 50 条

[31] Tubelet-Contrastive Self-Supervision for Video-Efficient Generalization
Thoker, Fida Mohammad
Doughty, Hazel
Snoek, Cees G. M.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13766 - 13777
[32] MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Chen, Jiawei
Ho, Chiu Man
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 786 - 797
[33] Action-Centric Relation Transformer Network for Video Question Answering
Zhang, Jipeng
Shao, Jie
Cao, Rui
Gao, Lianli
Xu, Xing
Shen, Heng Tao
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 63 - 74
[34] Temporal Shift Vision Transformer Adapter for Efficient Video Action Recognition
Shi, Yaning
Sun, Pu
Gu, Bing
Li, Longfei
PROCEEDINGS OF 2024 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND INTELLIGENT COMPUTING, BIC 2024, 2024, : 42 - 46
[35] A Multi-Modal Transformer network for action detection
Korban, Matthew
Youngs, Peter
Acton, Scott T.
PATTERN RECOGNITION, 2023, 142
[36] LGAFormer: transformer with local and global attention for action detection
Zhang, Haiping
Zhou, Fuxing
Wang, Dongjing
Zhang, Xinhao
Yu, Dongjin
Guan, Liming
JOURNAL OF SUPERCOMPUTING, 2024, 80 (12): : 17952 - 17979
[37] End-to-End Temporal Action Detection With Transformer
Liu, Xiaolong
Wang, Qimeng
Hu, Yao
Tang, Xu
Zhang, Shiwei
Bai, Song
Bai, Xiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
[38] Improved Deepfake Video Detection Using Convolutional Vision Transformer
Deressa, Deressa Wodajo
Lambert, Peter
Van Wallendael, Glenn
Atnafu, Solomon
Mareen, Hannes
2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
[39] Memory-Token Transformer for Unsupervised Video Anomaly Detection
Li, Youyu
Song, Xiaoning
Xu, Tianyang
Feng, Zhenhua
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3325 - 3332
[40] Video Relation Detection via Tracklet based Visual Transformer
Gao, Kaifeng
Chen, Long
Huang, Yifeng
Xiao, Jun
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4833 - 4837

← 1 2 3 4 5 →