Video-based driver action recognition via hybrid spatial-temporal deep learning framework

被引：8

作者：

Hu, Yaocong ^{[1
,2
]}

Lu, Mingqi ^{[1
,2
]}

Xie, Chao ^{[3
]}

Lu, Xiaobo ^{[1
,2
]}

机构：

[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China

[2] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Peoples R China

[3] Nanjing Forestry Univ, Coll Mech & Elect Engn, Nanjing 210037, Peoples R China

来源：

MULTIMEDIA SYSTEMS | 2021年 / 27卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Driver action; Encoder-decoder; Spatial-temporal; Attention; Convolutional long short-term memory; DRIVING POSTURES; CONVOLUTIONAL NETWORKS; INTELLIGENT VEHICLES; TRANSFORM; FEATURES;

D O I：

10.1007/s00530-020-00724-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Driver action recognition aims to distinguish normal driver action and some abnormal driver actions such as leaving the wheel, talking on the phone, diving with smoking, etc. For the purpose of traffic safety, studies on the computer vision technologies for driver action recognition have become especially meaningful. However, this issue is far from being solved, mainly due to the subtle variations between different driver action classes. In this paper, we present a new video-based driver action recognition approach based on the hybrid spatial-temporal deep learning framework. Specifically, we first design an encoder-decoder spatial-temporal convolutional neural network (EDSTCNN) to capture short-term spatial-temporal representation of driver actions jointly with optical flow prediction. Second, we exploit the feature refinement network (FRN) to refine the short-term driver action feature. Then, convolutional long short-term memory network (ConvLSTM) is employed for long-term spatial-temporal fusion. Finally, the fully connected neural network (FCNN) is used for final driver action recognition. In our experiment, we validate the performance of the proposed framework on our self-created datasets, including a simulated driving dataset and a real driving dataset. Extensive experimental results illustrate that the proposed hybrid spatial-temporal deep learning framework obtains the highest accuracy in multiple driver action recognition datasets (98.9% on SEU-DAR-V1 dataset and 97.0% on SEU-DAR-V2 dataset).

引用

页码：483 / 501

页数：19

共 50 条

[31] Spatial-Temporal Correlation for Trajectory based Action Video Retrieval
Shen, Xi
Zhang, Lelin
Wang, Zhiyong
Feng, Dagan
[J]. 2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
[32] Advanced skeleton-based action recognition via spatial-temporal rotation descriptors
Shen, Zhongwei
Wu, Xiao-Jun
Kittler, Josef
[J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1335 - 1346
[33] Human action recognition via multi-task learning base on spatial-temporal feature
Guo, Wenzhong
Chen, Guolong
[J]. INFORMATION SCIENCES, 2015, 320 : 418 - 428
[34] Learning spatial-temporal features via a pose-flow relational model for action recognition
Wu, Qianyu
Hu, Fangqiang
Zhu, Aichun
Wang, Zixuan
Bao, Yaping
[J]. AIP ADVANCES, 2020, 10 (07)
[35] Spatial-temporal interaction learning based two-stream network for action recognition
Liu, Tianyu
Ma, Yujun
Yang, Wenhan
Ji, Wanting
Wang, Ruili
Jiang, Ping
[J]. INFORMATION SCIENCES, 2022, 606 : 864 - 876
[36] Video-based framework for face recognition in video
Gorodnichy, DO
[J]. 2ND CANADIAN CONFERENCE ON COMPUTER AND ROBOT VISION, PROCEEDINGS, 2005, : 330 - 338
[37] A new framework for deep learning video based Human Action Recognition on the edge
Cob-Parro, Antonio Carlos
Losada-Gutierrez, Cristina
Marron-Romera, Marta
Gardel-Vicente, Alfredo
Bravo-Munoz, Ignacio
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[38] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
Li, Tao
Xiong, Wenjun
Zhang, Zheng
Pei, Lishen
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
[39] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
Li, Tao
Xiong, Wenjun
Zhang, Zheng
Pei, Lishen
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
[40] Spatial-temporal aware network for video-based person re-identification
Wang, Jun
Zhao, Qi
Jia, Di
Huang, Ziqing
Zhang, Miaohui
Ren, Xing
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36355 - 36373

← 1 2 3 4 5 →