Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN

被引:157
|
作者
Shi, Yemin [1 ]
Tian, Yonghong [1 ]
Wang, Yaowei [2 ]
Huang, Tiejun [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Cooperat Medianet Innovat Ctr, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; sequential deep trajectory descriptor (sDTD); three-stream framework; long-term motion;
D O I
10.1109/TMM.2017.2666540
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential deep trajectory descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion, and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51, and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset.
引用
收藏
页码:1510 / 1520
页数:11
相关论文
共 50 条
  • [31] 3 s-STNet: three-stream spatial-temporal network with appearance and skeleton information learning for action recognition
    Fang, Ming
    Peng, Siyu
    Zhao, Yang
    Yuan, Haibo
    Hung, Chih-Cheng
    Liu, Shuhua
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (02): : 1835 - 1848
  • [32] Dynamic Gesture Recognition Based on Three-Stream Coordinate Attention Network and Knowledge Distillation
    Wan, Shanshan
    Yang, Lan
    Ding, Keliang
    Qiu, Dongwei
    IEEE ACCESS, 2023, 11 : 50547 - 50559
  • [33] Human Action Recognition With Trajectory Based Covariance Descriptor In Unconstrained Videos
    Wang, Hanli
    Yi, Yun
    Wu, Jun
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1175 - 1178
  • [34] Deep temporal motion descriptor (DTMD) for human action recognition
    Nida, Nudrat
    Yousaf, Muhammad Haroon
    Irtaza, Aun
    Velastin, Sergio A.
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2020, 28 (03) : 1371 - 1385
  • [35] Ensemble Three-Stream RGB-S Deep Neural Network for Human Behavior Recognition Under Intelligent Home Service Robot Environments
    Byeon, Yeong-Hyeon
    Kim, Dohyung
    Lee, Jaeyeon
    Kwak, Keun-Chang
    IEEE ACCESS, 2021, 9 : 73240 - 73250
  • [36] Three-Stream Network With Bidirectional Self-Attention for Action Recognition in Extreme Low Resolution Videos (vol 26, pg 1187, 2019)
    Purwanto, Didik
    Pramono, Rizard Renanda Adhi
    Chen, Yie-Tarng
    Fang, Wen-Hsien
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 2188 - 2188
  • [37] Building roof wireframe extraction from aerial images using a three-stream deep neural network
    Esmaeily, Zahra
    Rezaeian, Mehdi
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [38] A three-stream fusion network for 3D skeleton-based action recognitionA three-stream fusion network for 3D skeleton-based action recognitionM. Fang et al.
    Ming Fang
    Qi Liu
    Jianping Ren
    Jie Li
    Xinning Du
    Shuhua Liu
    Multimedia Systems, 2025, 31 (3)
  • [39] NIRExpNet: Three-Stream 3D Convolutional Neural Network for Near Infrared Facial Expression Recognition
    Wu, Zhan
    Chen, Tong
    Chen, Ying
    Zhang, Zhihao
    Liu, Guangyuan
    APPLIED SCIENCES-BASEL, 2017, 7 (11):
  • [40] Two-stream Deep Representation for Human Action Recognition
    Ghrab, Najla Bouarada
    Fendri, Emna
    Hammami, Mohamed
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084