Temporal Segment Networks for Action Recognition in Videos

被引:569
|
作者
Wang, Limin [1 ]
Xiong, Yuanjun [2 ]
Wang, Zhe [3 ]
Qiao, Yu [4 ]
Lin, Dahua [5 ]
Tang, Xiaoou [5 ]
Van Gool, Luc [6 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Amazon Web Serv, Seattle, WA 98101 USA
[3] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[4] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[5] Chinese Univ Hong Kong, Dept Informat Engn, Shatin, Hong Kong, Peoples R China
[6] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
基金
美国国家科学基金会;
关键词
Action recognition; temporal segment networks; temporal modeling; good practices; ConvNets; REPRESENTATION; VECTOR;
D O I
10.1109/TPAMI.2018.2868668
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams.
引用
收藏
页码:2740 / 2755
页数:16
相关论文
共 50 条
  • [1] Temporal Segment Networks Based on Feature Propagation for Action Recognition
    Shi, Yuexiang
    Zeng, Zhichao
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2020, 32 (04): : 582 - 589
  • [2] Unified Spatio-Temporal Attention Networks for Action Recognition in Videos
    Li, Dong
    Yao, Ting
    Duan, Ling-Yu
    Mei, Tao
    Rui, Yong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 416 - 428
  • [3] Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
    Wang, Limin
    Xiong, Yuanjun
    Wang, Zhe
    Qiao, Yu
    Lin, Dahua
    Tang, Xiaoou
    Van Gool, Luc
    [J]. COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 20 - 36
  • [4] Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling
    Sarabu, Ashok
    Santra, Ajit Kumar
    [J]. DATA, 2020, 5 (04) : 1 - 12
  • [5] Analysis of Temporal Coherence in Videos for Action Recognition
    Saleh, Adel
    Abdel-Nasser, Mohamed
    Akram, Farhan
    Garcia, Miguel Angel
    Puig, Domenec
    [J]. IMAGE ANALYSIS AND RECOGNITION (ICIAR 2016), 2016, 9730 : 325 - 332
  • [6] Action Recognition in Videos with Temporal Segments Fusions
    Fang, Yuanye
    Zhang, Rui
    Wang, Qiu-Feng
    Huang, Kaizhu
    [J]. ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, 2020, 11691 : 244 - 253
  • [7] Action Progression Networks for Temporal Action Detection in Videos
    Lu, Chong-Kai
    Mak, Man-Wai
    Li, Ruimin
    Chi, Zheru
    Fu, Hong
    [J]. IEEE ACCESS, 2024, 12 : 126829 - 126844
  • [8] Temporal segment graph convolutional networks for skeleton-based action recognition
    Ding, Chongyang
    Wen, Shan
    Ding, Wenwen
    Liu, Kai
    Belyaev, Evgeny
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 110
  • [9] Sequential Segment Networks for Action Recognition
    Chen, Quan-Qi
    Zhang, Yu-Jin
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (05) : 712 - 716
  • [10] Temporal Segment Connection Network for Action Recognition
    Li, Qian
    Yang, Wenzhu
    Chen, Xiangyang
    Yuan, Tongtong
    Wang, Yuxia
    [J]. IEEE ACCESS, 2020, 8 : 179118 - 179127