An Empirical Study of End-to-End Temporal Action Detection

被引:13
|
作者
Liu, Xiaolong [1 ]
Bai, Song [2 ]
Bai, Xiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] ByteDance Inc, Wuhan, Peoples R China
基金
国家重点研发计划;
关键词
NETWORK;
D O I
10.1109/CVPR52688.2022.01938
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal action detection ('TAD) is an important yet challenging task in video understanding. It aims to simultaneously predict the semantic label and the temporal interval of every action instance in an untrimmed video. Rather than end-to-end learning, most existing methods adopt a head-only learning paradigm, where the video encoder is pre-trained for action classification, and only the detection head upon the encoder is optimized for TAD. The effect of end-to-end learning is not systematically evaluated. Besides, there lacks an in-depth study on the efficiency-accuracy trade-off in end-to-end TAD. In this paper, we present an empirical study of end-to-end temporal action detection. We validate the advantage of end-to-end learning over head-only learning and observe up to 11% performance improvement. Besides, we study the effects of multiple design choices that affect the TAD performance and speed, including detection head, video encoder, and resolution of input videos. Based on the findings, we build a mid-resolution baseline detector, which achieves the state-of-the-art performance of end-to-end methods while running more than 4x faster. We hope that this paper can serve as a guide for end-to-end learning and inspire future research in this field.
引用
收藏
页码:19978 / 19987
页数:10
相关论文
共 50 条
  • [1] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [2] End-to-End Temporal Action Detection Using Bag of Discriminant Snippets
    Murtaza, Fiza
    Yousaf, Muhammad Haroon
    Velastin, Sergio A.
    Qian, Yu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (02) : 272 - 276
  • [3] DITA: DETR with improved queries for end-to-end temporal action detection
    Lu, Chongkai
    Mak, Man-Wai
    [J]. NEUROCOMPUTING, 2024, 596
  • [4] End-to-end temporal attention extraction and human action recognition
    Zhang, Hong
    Xin, Miao
    Wang, Shuhang
    Yang, Yifan
    Zhang, Lei
    Wang, Helong
    [J]. MACHINE VISION AND APPLICATIONS, 2018, 29 (07) : 1127 - 1142
  • [5] End-to-end temporal attention extraction and human action recognition
    Hong Zhang
    Miao Xin
    Shuhang Wang
    Yifan Yang
    Lei Zhang
    Helong Wang
    [J]. Machine Vision and Applications, 2018, 29 : 1127 - 1142
  • [6] End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
    Liu, Shuming
    Zhang, Chen-Lin
    Zhao, Chen
    Ghanem, Bernard
    [J]. arXiv, 2023,
  • [7] Temporal Global Correlation Network for End-to-End Action Proposal Generation
    Ma, Bai-Teng
    Zhang, Shi-Wei
    Gao, Chang-Xin
    Sang, Nong
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (10): : 2452 - 2461
  • [8] Video Action Recognition With An Additional End-To-end Trained Temporal Stream
    Cong, Guojing
    Domeniconi, Giacomo
    Yang, Chih-Chieh
    Shapiro, Joshua
    Chen, Barry
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 51 - 60
  • [9] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516
  • [10] Synthetic Temporal Anomaly Guided End-to-End Video Anomaly Detection
    Astrid, Marcella
    Zaheer, Muhammad Zaigham
    Lee, Seung-Ik
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 207 - 214