Sequential Deep Trajectory Descriptor for Action Recognition With Three-Stream CNN

被引：157

作者：

Shi, Yemin ^{[1
]}

Tian, Yonghong ^{[1
]}

Wang, Yaowei ^{[2
]}

Huang, Tiejun ^{[1
]}

机构：

[1] Peking Univ, Sch Elect Engn & Comp Sci, Cooperat Medianet Innovat Ctr, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China

[2] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2017年 / 19卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Action recognition; sequential deep trajectory descriptor (sDTD); three-stream framework; long-term motion;

D O I：

10.1109/TMM.2017.2666540

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Learning the spatial-temporal representation of motion information is crucial to human action recognition. Nevertheless, most of the existing features or descriptors cannot capture motion information effectively, especially for long-term motion. To address this problem, this paper proposes a long-term motion descriptor called sequential deep trajectory descriptor (sDTD). Specifically, we project dense trajectories into two-dimensional planes, and subsequently a CNN-RNN network is employed to learn an effective representation for long-term motion. Unlike the popular two-stream ConvNets, the sDTD stream is introduced into a three-stream framework so as to identify actions from a video sequence. Consequently, this three-stream framework can simultaneously capture static spatial features, short-term motion, and long-term motion in the video. Extensive experiments were conducted on three challenging datasets: KTH, HMDB51, and UCF101. Experimental results show that our method achieves state-of-the-art performance on the KTH and UCF101 datasets, and is comparable to the state-of-the-art methods on the HMDB51 dataset.

引用

页码：1510 / 1520

页数：11

共 50 条

[41] Multiple stream deep learning model for human action recognition
Gu, Ye
Ye, Xiaofeng
Sheng, Weihua
Ou, Yongsheng
Li, Yongqiang
IMAGE AND VISION COMPUTING, 2020, 93
[42] Action Recognition Using Multi-stream 2D CNN with Deep Learning-Based Temporal Modality
Kang, Keonwoo
Park, Sangwoo
Park, Hasil
Kang, Donggoo
Paik, Joonki
2023 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, ICCE, 2023,
[43] Typhoon Trajectory Prediction by Three CNN+ Deep-Learning Approaches
Lin, Gang
Liang, Yanchun
Tavares, Adriano
Lima, Carlos
Xia, Dong
ELECTRONICS, 2024, 13 (19)
[44] Facial micro-expression recognition using three-stream vision transformer network with sparse sampling and relabeling
Zhang, He
Yin, Lu
Zhang, Hanling
Wu, Xuesong
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3761 - 3771
[45] Facial micro-expression recognition using three-stream vision transformer network with sparse sampling and relabeling
He Zhang
Lu Yin
Hanling Zhang
Xuesong Wu
Signal, Image and Video Processing, 2024, 18 : 3761 - 3771
[46] Binary dense sift flow based two stream CNN for human action recognition
Park, Sang Kyoo
Chung, Jun Ho
Kang, Tae Koo
Lim, Myo Taeg
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35697 - 35720
[47] LEARNING GEOMETRIC FEATURES WITH DUAL - STREAM CNN FOR 3D ACTION RECOGNITION
Thien Huynh-The
Hua, Cam-Hao
Nguyen Anh Tu
Kim, Dong-Seong
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2353 - 2357
[48] DKD–DAD: a novel framework with discriminative kinematic descriptor and deep attention-pooled descriptor for action recognition
Ming Tong
Mingyang Li
He Bai
Lei Ma
Mengao Zhao
Neural Computing and Applications, 2020, 32 : 5285 - 5302
[49] Two-Stream RNN/CNN for Action Recognition in 3D Videos
Zhao, Rui
Ali, Haider
van der Smagt, Patrick
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 4260 - 4267
[50] Binary dense sift flow based two stream CNN for human action recognition
Sang Kyoo Park
Jun Ho Chung
Tae Koo Kang
Myo Taeg Lim
Multimedia Tools and Applications, 2021, 80 : 35697 - 35720

← 1 2 3 4 5 →