TubeR: Tubelet Transformer for Video Action Detection

被引：32

作者：

Zhao, Jiaojiao ^{[1
]}

Zhang, Yanyi ^{[2
]}

Li, Xinyu ^{[3
]}

Chen, Hao ^{[3
]}

Shuai, Bing ^{[3
]}

Xu, Mingze ^{[3
]}

Liu, Chunhui ^{[3
]}

Kundu, Kaustav ^{[3
]}

Xiong, Yuanjun ^{[3
]}

Modolo, Davide ^{[3
]}

Marsic, Ivan ^{[2
]}

Snoek, Cees G. M. ^{[1
]}

Tighe, Joseph ^{[3
]}

机构：

[1] Univ Amsterdam, Amsterdam, Netherlands

[2] Rutgers State Univ, New Brunswick, NJ USA

[3] AWS AI Labs, Palo Alto, CA USA

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.01323

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose TubeR: a simple solution for spatio-temporal video action detection. Different from existing methods that depend on either an off-line actor detector or hand-designed actor-positional hypotheses like proposals or anchors, we propose to directly detect an action tubelet in a video by simultaneously performing action localization and recognition from a single representation. TubeR learns a set of tubelet-queries and utilizes a tubelet-attention module to model the dynamic spatio-temporal nature of a video clip, which effectively reinforces the model capacity compared to using actor-positional hypotheses in the spatio-temporal space. For videos containing transitional states or scene changes, we propose a context aware classification head to utilize short-term and long-term context to strengthen action classification, and an action switch regression head for detecting the precise temporal action extent. TubeR directly produces action tubelets with variable lengths and even maintains good results for long video clips. TubeR outperforms the previous state-of-the-art on commonly used action detection datasets AVA, UCF101-24 and JHMDB51-21. Code will be available on GluonCV(https://cv.gluon.ai/).

引用

页码：13588 / 13597

页数：10

共 50 条

[21] TRCDNet: A Transformer Network for Video Cloud Detection
Luo, Chen
Feng, Shanshan
Quan, Yingling
Ye, Yunming
Li, Xutao
Xu, Yong
Zhang, Baoquan
Chen, Zhihao
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[22] Deepfake Video Detection with Spatiotemporal Dropout Transformer
Zhang, Daichi
Lin, Fanzhao
Hua, Yingying
Wang, Pengju
Zeng, Dan
Ge, Shiming
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5833 - 5841
[23] Body-part Tubelet Transformer for Human-Related Crime Classification
Joseph, Ajay Mathew
Ullah, Fath U. Min
Talavera, Estefania
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
[24] TWO-PATHWAY TRANSFORMER NETWORK FOR VIDEO ACTION RECOGNITION
Jiang, Bo
Yu, Jiahong
Zhou, Lei
Wu, Kailin
Yang, Yang
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1089 - 1093
[25] SVFormer: Semi-supervised Video Transformer for Action Recognition
Xing, Zhen
Dai, Qi
Hu, Han
Chen, Jingjing
Wu, Zuxuan
Jiang, Yu-Gang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18816 - 18826
[26] FSformer: Fast-Slow Transformer for video action recognition
Li, Shibao
Wang, Zhaoyu
Liu, Yixuan
Zhang, Yunwu
Zhu, Jinze
Cui, Xuerong
Liu, Jianhang
IMAGE AND VISION COMPUTING, 2023, 137
[27] Holistic Interaction Transformer Network for Action Detection
Faure, Gueter Josmy
Chen, Min-Hung
Lai, Shang-Hong
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3329 - 3339
[28] SCOTCH and SODA: A Transformer Video Shadow Detection Framework
Liu, Lihao
Prost, Jean
Zhu, Lei
Papadakis, Nicolas
Lio, Pietro
Schonlieb, Carola-Bibiane
Aviles-Rivero, Angelica I.
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10449 - 10458
[29] Learning Video Localization on Segment-Level Video Copy Detection with Transformer
Zhang, Chi
Liu, Jie
Zhang, Shuwu
Zeng, Zhi
Huang, Ying
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 439 - 450
[30] Video Sparse Transformer With Attention-Guided Memory for Video Object Detection
Fujitake, Masato
Sugimoto, Akihiro
IEEE ACCESS, 2022, 10 : 65886 - 65900

← 1 2 3 4 5 →