Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN

被引:157
|
作者
Shi, Yemin [1 ]
Tian, Yonghong [1 ]
Wang, Yaowei [2 ]
Huang, Tiejun [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Cooperat Medianet Innovat Ctr, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; sequential deep trajectory descriptor (sDTD); three-stream framework; long-term motion;
D O I
10.1109/TMM.2017.2666540
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential deep trajectory descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion, and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51, and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset.
引用
收藏
页码:1510 / 1520
页数:11
相关论文
共 50 条
  • [1] Trajectory-aware three-stream CNN for video action recognition
    Weng, Zhengkui
    Guan, Yepeng
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [2] Three-stream CNNs for action recognition
    Wang, Liangliang
    Ge, Lianzheng
    Li, Ruifeng
    Fang, Yajun
    PATTERN RECOGNITION LETTERS, 2017, 92 : 33 - 40
  • [3] Multi-Modal Three-Stream Network for Action Recognition
    Khalid, Muhammad Usman
    Yu, Jie
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 3210 - 3215
  • [4] Visual Scene Induced Three-stream Network for Efficient Action Recognition
    He, Jun
    Zhao, Xiaochong
    Sun, Bo
    Yu, Xiaocui
    Zhang, Yinghui
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 550 - 554
  • [5] First-Person Activity Recognition Based on Three-Stream Deep Features
    Kim, Ye-Ji
    Lee, Dong-Gyu
    Lee, Seong-Whan
    2018 18TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2018, : 297 - 299
  • [6] Human Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid Model
    Pang, Bo
    Zha, Kaiwen
    Lu, Cewu
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2388 - 2397
  • [7] Three-Stream Graph Convolutional Networks for Zero-Shot Action Recognition
    Wu, Nan
    Kawamoto, Kazuhiko
    2020 JOINT 11TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 21ST INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS-ISIS), 2020, : 392 - 396
  • [8] Zero-Shot Action Recognition with Three-Stream Graph Convolutional Networks
    Wu, Nan
    Kawamoto, Kazuhiko
    SENSORS, 2021, 21 (11)
  • [9] Beyond Two-stream: Skeleton-based Three-stream Networks for Action Recognition in Videos
    Xu, Jianfeng
    Tasaka, Kazuyuki
    Yanagihara, Hiromasa
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 1567 - 1573
  • [10] LEARNING DEEP TRAJECTORY DESCRIPTOR FOR ACTION RECOGNITION IN VIDEOS USING DEEP NEURAL NETWORKS
    Shi, Yemin
    Zeng, Wei
    Huang, Tiejun
    Wang, Yaowei
    2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,