Video-based driver action recognition via hybrid spatial-temporal deep learning framework

被引:8
|
作者
Hu, Yaocong [1 ,2 ]
Lu, Mingqi [1 ,2 ]
Xie, Chao [3 ]
Lu, Xiaobo [1 ,2 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Peoples R China
[3] Nanjing Forestry Univ, Coll Mech & Elect Engn, Nanjing 210037, Peoples R China
基金
中国国家自然科学基金;
关键词
Driver action; Encoder-decoder; Spatial-temporal; Attention; Convolutional long short-term memory; DRIVING POSTURES; CONVOLUTIONAL NETWORKS; INTELLIGENT VEHICLES; TRANSFORM; FEATURES;
D O I
10.1007/s00530-020-00724-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Driver action recognition aims to distinguish normal driver action and some abnormal driver actions such as leaving the wheel, talking on the phone, diving with smoking, etc. For the purpose of traffic safety, studies on the computer vision technologies for driver action recognition have become especially meaningful. However, this issue is far from being solved, mainly due to the subtle variations between different driver action classes. In this paper, we present a new video-based driver action recognition approach based on the hybrid spatial-temporal deep learning framework. Specifically, we first design an encoder-decoder spatial-temporal convolutional neural network (EDSTCNN) to capture short-term spatial-temporal representation of driver actions jointly with optical flow prediction. Second, we exploit the feature refinement network (FRN) to refine the short-term driver action feature. Then, convolutional long short-term memory network (ConvLSTM) is employed for long-term spatial-temporal fusion. Finally, the fully connected neural network (FCNN) is used for final driver action recognition. In our experiment, we validate the performance of the proposed framework on our self-created datasets, including a simulated driving dataset and a real driving dataset. Extensive experimental results illustrate that the proposed hybrid spatial-temporal deep learning framework obtains the highest accuracy in multiple driver action recognition datasets (98.9% on SEU-DAR-V1 dataset and 97.0% on SEU-DAR-V2 dataset).
引用
收藏
页码:483 / 501
页数:19
相关论文
共 50 条
  • [1] Video-based driver action recognition via hybrid spatial–temporal deep learning framework
    Yaocong Hu
    Mingqi Lu
    Chao Xie
    Xiaobo Lu
    [J]. Multimedia Systems, 2021, 27 : 483 - 501
  • [2] Video-based Driver Action Recognition via Spatial-Temporal and Motion Deep Learning
    Ma, Fangzhi
    Xing, Guanyu
    Liu, Yanli
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [3] Video-based Driver Action Recognition via Spatial-Temporal and Motion Deep Learning
    Ma, Fangzhi
    Xing, Guanyu
    Liu, Yanli
    [J]. Proceedings of the International Joint Conference on Neural Networks, 2023, 2023-June
  • [4] Video-based driver emotion recognition using hybrid deep spatio-temporal feature learning
    Varma, Harshit
    Ganapathy, Nagarajan
    Deserno, Thomas M.
    [J]. MEDICAL IMAGING 2022: IMAGING INFORMATICS FOR HEALTHCARE, RESEARCH, AND APPLICATIONS, 2022, 12037
  • [5] Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification
    Wu, Zuxuan
    Wang, Xi
    Jiang, Yu-Gang
    Ye, Hao
    Xue, Xiangyang
    [J]. MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 461 - 470
  • [6] A Deep Spatial and Temporal Aggregation Framework for Video-Based Facial Expression Recognition
    Pan, Xianzhang
    Ying, Guoliang
    Chen, Guodong
    Li, Hongming
    Li, Wenshu
    [J]. IEEE ACCESS, 2019, 7 : 48807 - 48815
  • [7] Streamer action recognition in live video with spatial-temporal attention and deep dictionary learning
    Li, Chenhao
    Zhang, Jing
    Yao, Jiacheng
    [J]. NEUROCOMPUTING, 2021, 453 : 383 - 392
  • [8] A deep learning method for video-based action recognition
    Zhang, Guanwen
    Rao, Yukun
    Wang, Changhao
    Zhou, Wei
    Ji, Xiangyang
    [J]. IET IMAGE PROCESSING, 2021, 15 (14) : 3498 - 3511
  • [9] Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
    Li, Jinghua
    Huai, Huarui
    Gao, Junbin
    Kong, Dehui
    Wang, Lichun
    [J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2019, 13 (04) : 363 - 371
  • [10] Spatial-temporal dynamic hand gesture recognition via hybrid deep learning model
    Jinghua Li
    Huarui Huai
    Junbin Gao
    Dehui Kong
    Lichun Wang
    [J]. Journal on Multimodal User Interfaces, 2019, 13 : 363 - 371