Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN

被引:157
|
作者
Shi, Yemin [1 ]
Tian, Yonghong [1 ]
Wang, Yaowei [2 ]
Huang, Tiejun [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Cooperat Medianet Innovat Ctr, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; sequential deep trajectory descriptor (sDTD); three-stream framework; long-term motion;
D O I
10.1109/TMM.2017.2666540
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential deep trajectory descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion, and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51, and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset.
引用
收藏
页码:1510 / 1520
页数:11
相关论文
共 50 条
  • [21] Tifar-net: three-stream inception former-based action recognition network for infrared videos
    Imran, Javed
    Rajput, Amitesh Singh
    Vashisht, Rohit
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (02)
  • [22] Three-stream fusion network for first-person interaction recognition
    Kim, Ye-Ji
    Lee, Dong-Gyu
    Lee, Seong-Whan
    PATTERN RECOGNITION, 2020, 103
  • [23] DSRF: A flexible trajectory descriptor for articulated human action recognition
    Guo, Yao
    Li, Youfu
    Shao, Zhanpeng
    PATTERN RECOGNITION, 2018, 76 : 137 - 148
  • [24] Motion keypoint trajectory and covariance descriptor for human action recognition
    Yi, Yun
    Wang, Hanli
    VISUAL COMPUTER, 2018, 34 (03): : 391 - 403
  • [25] Motion keypoint trajectory and covariance descriptor for human action recognition
    Yun Yi
    Hanli Wang
    The Visual Computer, 2018, 34 : 391 - 403
  • [26] Three-Stream Convolutional Neural Network with Multi-task and Ensemble Learning for 3D Action Recognition
    Liang, Duohan
    Fan, Guoliang
    Lin, Guangfeng
    Chen, Wanjun
    Pan, Xiaorong
    Zhu, Hong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 934 - 940
  • [27] 3 s-STNet: three-stream spatial–temporal network with appearance and skeleton information learning for action recognition
    Ming Fang
    Siyu Peng
    Yang Zhao
    Haibo Yuan
    Chih-Cheng Hung
    Shuhua Liu
    Neural Computing and Applications, 2023, 35 : 1835 - 1848
  • [28] Skeleton-based human action recognition by fusing attention based three-stream convolutional neural network and SVM
    Fang Ren
    Chao Tang
    Anyang Tong
    Wenjian Wang
    Multimedia Tools and Applications, 2024, 83 : 6273 - 6295
  • [29] Skeleton-based human action recognition by fusing attention based three-stream convolutional neural network and SVM
    Ren, Fang
    Tang, Chao
    Tong, Anyang
    Wang, Wenjian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 6273 - 6295
  • [30] Hyperspectral and LiDAR Fusion Using Deep Three-Stream Convolutional Neural Networks
    Li, Hao
    Ghamisi, Pedram
    Soergel, Uwe
    Zhu, Xiao Xiang
    REMOTE SENSING, 2018, 10 (10)