Video-based driver action recognition via hybrid spatial-temporal deep learning framework

被引:8
|
作者
Hu, Yaocong [1 ,2 ]
Lu, Mingqi [1 ,2 ]
Xie, Chao [3 ]
Lu, Xiaobo [1 ,2 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Peoples R China
[3] Nanjing Forestry Univ, Coll Mech & Elect Engn, Nanjing 210037, Peoples R China
基金
中国国家自然科学基金;
关键词
Driver action; Encoder-decoder; Spatial-temporal; Attention; Convolutional long short-term memory; DRIVING POSTURES; CONVOLUTIONAL NETWORKS; INTELLIGENT VEHICLES; TRANSFORM; FEATURES;
D O I
10.1007/s00530-020-00724-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Driver action recognition aims to distinguish normal driver action and some abnormal driver actions such as leaving the wheel, talking on the phone, diving with smoking, etc. For the purpose of traffic safety, studies on the computer vision technologies for driver action recognition have become especially meaningful. However, this issue is far from being solved, mainly due to the subtle variations between different driver action classes. In this paper, we present a new video-based driver action recognition approach based on the hybrid spatial-temporal deep learning framework. Specifically, we first design an encoder-decoder spatial-temporal convolutional neural network (EDSTCNN) to capture short-term spatial-temporal representation of driver actions jointly with optical flow prediction. Second, we exploit the feature refinement network (FRN) to refine the short-term driver action feature. Then, convolutional long short-term memory network (ConvLSTM) is employed for long-term spatial-temporal fusion. Finally, the fully connected neural network (FCNN) is used for final driver action recognition. In our experiment, we validate the performance of the proposed framework on our self-created datasets, including a simulated driving dataset and a real driving dataset. Extensive experimental results illustrate that the proposed hybrid spatial-temporal deep learning framework obtains the highest accuracy in multiple driver action recognition datasets (98.9% on SEU-DAR-V1 dataset and 97.0% on SEU-DAR-V2 dataset).
引用
收藏
页码:483 / 501
页数:19
相关论文
共 50 条
  • [31] Spatial-Temporal Correlation for Trajectory based Action Video Retrieval
    Shen, Xi
    Zhang, Lelin
    Wang, Zhiyong
    Feng, Dagan
    [J]. 2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
  • [32] Advanced skeleton-based action recognition via spatial-temporal rotation descriptors
    Shen, Zhongwei
    Wu, Xiao-Jun
    Kittler, Josef
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1335 - 1346
  • [33] Human action recognition via multi-task learning base on spatial-temporal feature
    Guo, Wenzhong
    Chen, Guolong
    [J]. INFORMATION SCIENCES, 2015, 320 : 418 - 428
  • [34] Learning spatial-temporal features via a pose-flow relational model for action recognition
    Wu, Qianyu
    Hu, Fangqiang
    Zhu, Aichun
    Wang, Zixuan
    Bao, Yaping
    [J]. AIP ADVANCES, 2020, 10 (07)
  • [35] Spatial-temporal interaction learning based two-stream network for action recognition
    Liu, Tianyu
    Ma, Yujun
    Yang, Wenhan
    Ji, Wanting
    Wang, Ruili
    Jiang, Ping
    [J]. INFORMATION SCIENCES, 2022, 606 : 864 - 876
  • [36] Video-based framework for face recognition in video
    Gorodnichy, DO
    [J]. 2ND CANADIAN CONFERENCE ON COMPUTER AND ROBOT VISION, PROCEEDINGS, 2005, : 330 - 338
  • [37] A new framework for deep learning video based Human Action Recognition on the edge
    Cob-Parro, Antonio Carlos
    Losada-Gutierrez, Cristina
    Marron-Romera, Marta
    Gardel-Vicente, Alfredo
    Bravo-Munoz, Ignacio
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [38] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [39] Video Action Recognition by Combining Spatial-Temporal Cues with Graph Convolutional Networks
    Li, Tao
    Xiong, Wenjun
    Zhang, Zheng
    Pei, Lishen
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2023,
  • [40] Spatial-temporal aware network for video-based person re-identification
    Wang, Jun
    Zhao, Qi
    Jia, Di
    Huang, Ziqing
    Zhang, Miaohui
    Ren, Xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36355 - 36373