Video-based driver action recognition via hybrid spatial-temporal deep learning framework

被引:8
|
作者
Hu, Yaocong [1 ,2 ]
Lu, Mingqi [1 ,2 ]
Xie, Chao [3 ]
Lu, Xiaobo [1 ,2 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Peoples R China
[3] Nanjing Forestry Univ, Coll Mech & Elect Engn, Nanjing 210037, Peoples R China
基金
中国国家自然科学基金;
关键词
Driver action; Encoder-decoder; Spatial-temporal; Attention; Convolutional long short-term memory; DRIVING POSTURES; CONVOLUTIONAL NETWORKS; INTELLIGENT VEHICLES; TRANSFORM; FEATURES;
D O I
10.1007/s00530-020-00724-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Driver action recognition aims to distinguish normal driver action and some abnormal driver actions such as leaving the wheel, talking on the phone, diving with smoking, etc. For the purpose of traffic safety, studies on the computer vision technologies for driver action recognition have become especially meaningful. However, this issue is far from being solved, mainly due to the subtle variations between different driver action classes. In this paper, we present a new video-based driver action recognition approach based on the hybrid spatial-temporal deep learning framework. Specifically, we first design an encoder-decoder spatial-temporal convolutional neural network (EDSTCNN) to capture short-term spatial-temporal representation of driver actions jointly with optical flow prediction. Second, we exploit the feature refinement network (FRN) to refine the short-term driver action feature. Then, convolutional long short-term memory network (ConvLSTM) is employed for long-term spatial-temporal fusion. Finally, the fully connected neural network (FCNN) is used for final driver action recognition. In our experiment, we validate the performance of the proposed framework on our self-created datasets, including a simulated driving dataset and a real driving dataset. Extensive experimental results illustrate that the proposed hybrid spatial-temporal deep learning framework obtains the highest accuracy in multiple driver action recognition datasets (98.9% on SEU-DAR-V1 dataset and 97.0% on SEU-DAR-V2 dataset).
引用
收藏
页码:483 / 501
页数:19
相关论文
共 50 条
  • [21] Spatial-Temporal Attention-Aware Learning for Video-Based Person Re-Identification
    Chen, Guangyi
    Lu, Jiwen
    Yang, Ming
    Zhou, Jie
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (09) : 4192 - 4205
  • [22] Deep-Learning-Based Stress Recognition with Spatial-Temporal Facial Information
    Jeon, Taejae
    Bae, Han Byeol
    Lee, Yongju
    Jang, Sungjun
    Lee, Sangyoun
    [J]. SENSORS, 2021, 21 (22)
  • [23] A Deep Learning Framework for Video-Based Vehicle Counting
    Lin, Haojia
    Yuan, Zhilu
    He, Biao
    Kuai, Xi
    Li, Xiaoming
    Guo, Renzhong
    [J]. FRONTIERS IN PHYSICS, 2022, 10
  • [24] Fusing HOG and convolutional neural network spatial-temporal features for video-based facial expression recognition
    Pan, Xianzhang
    [J]. IET IMAGE PROCESSING, 2020, 14 (01) : 176 - 182
  • [25] Recent Advances in Video-Based Human Action Recognition using Deep Learning: A Review
    Wu, Di
    Sharma, Nabin
    Blumenstein, Michael
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2865 - 2872
  • [26] Spatial-temporal attention wavenet: A deep learning framework for traffic prediction considering spatial-temporal dependencies
    Tian, Chenyu
    Chan, Wai Kin
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (04) : 549 - 561
  • [27] Spatial-Temporal Attention for Action Recognition
    Sun, Dengdi
    Wu, Hanqing
    Ding, Zhuanlian
    Luo, Bin
    Tang, Jin
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 854 - 864
  • [28] Spatial-Temporal Graph Convolutional Framework for Yoga Action Recognition and Grading
    Wang, Shu
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [29] A hybrid spatial-temporal deep learning architecture for lane detection
    Dong, Yongqi
    Patil, Sandeep
    van Arem, Bart
    Farah, Haneen
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (01) : 67 - 86
  • [30] Deep Learning based Spatial-Temporal In-loop filtering for Versatile Video Coding
    Pham, Chi D. K.
    Fu, Chen
    Zhou, Jinjia
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1861 - 1865