Video-based driver emotion recognition using hybrid deep spatio-temporal feature learning

被引:2
|
作者
Varma, Harshit [1 ]
Ganapathy, Nagarajan [2 ,3 ]
Deserno, Thomas M. [2 ,3 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
[2] TU Braunschweig, Peter L Reichertz Inst Med Informat, Braunschweig, Germany
[3] Hannover Med Sch, Braunschweig, Germany
关键词
Driver Emotion; Emotion Recognition; Facial Expression; Video Processing; Convolutional Neural; Network; Long Short-Term Memory; Classification; Deep Learning;
D O I
10.1117/12.2613118
中图分类号
R-058 [];
学科分类号
摘要
Road traffic crashes have become the leading cause of death for young people. Approximately 1.3 million people die due to road traffic crashes, and more than 30 million people suffer non-fatal injuries. Various studies have shown that emotions influence driving performance. In this work, we focus on frame-level video-based categorical emotion recognition in drivers. We propose a Convolutional Bidirectional Long Short-Term Memory Neural Network (CBiLSTM) architecture to capture the spatio-temporal features of the video data effectively. For this, the facial videos of drivers are obtained from two publicly available datasets, namely Keimyung University Facial Expression of Drivers (KMU-FED), a subset of the Driver Monitoring Dataset (DMD), and an experimental dataset. Firstly, we extract the face region from the video frames using the Facial Alignment Network (FAN). Secondly, these face regions are encoded using a lightweight SqueezeNet CNN model. The output of the CNN model is fed into a two-layered BiLSTM network for spatio-temporal feature learning. Finally, a fully-connected layer outputs the emotion class softmax probabilities. Furthermore, we enable interpretable visualizations of the results using Axiom-based Grad-CAM (XGrad-CAM). For this study, we manually annotated the DMD and our experimental dataset using an interactive annotation tool. Our framework achieves an F1-score of 0.958 on the KMU-FED dataset. We evaluate our model using Leave-One-Out Cross-Validation (LOOCV) for the DMD and the experimental dataset and achieve average F1-scores of 0.745 and 0.414 respectively.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild
    Lu, Cheng
    Zheng, Wenming
    Li, Chaolong
    Tang, Chuangao
    Liu, Suyuan
    Yan, Simeng
    Zong, Yuan
    [J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 646 - 652
  • [2] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
    Fonnegra, Ruben D.
    Diaz, Gloria M.
    [J]. HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
  • [3] Video-based Emotion Recognition using Aggregated Features and Spatio-temporal Information
    Xu, Jinchang
    Dong, Yuan
    Ma, Lilei
    Bai, Hongliang
    [J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2833 - 2838
  • [4] Video-based driver action recognition via hybrid spatial–temporal deep learning framework
    Yaocong Hu
    Mingqi Lu
    Chao Xie
    Xiaobo Lu
    [J]. Multimedia Systems, 2021, 27 : 483 - 501
  • [5] Spatio-temporal keypoints for video-based face recognition
    Franco, A.
    Maio, D.
    Turroni, F.
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 489 - 494
  • [6] Video-based driver action recognition via hybrid spatial-temporal deep learning framework
    Hu, Yaocong
    Lu, Mingqi
    Xie, Chao
    Lu, Xiaobo
    [J]. MULTIMEDIA SYSTEMS, 2021, 27 (03) : 483 - 501
  • [7] Deep Spatio-Temporal Mutual Learning for EEG Emotion Recognition
    Ye, Wenqing
    Li, Xinyu
    Zhang, Haokun
    Zhu, Zhuolin
    Li, Dongdong
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [8] Video-Based Emotion Recognition using Face Frontalization and Deep Spatiotemporal Feature
    Wang, Jinwei
    Zhao, Ziping
    Liang, Jinglian
    Li, Chao
    [J]. 2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [9] Action Recognition Based on Efficient Deep Feature Learning in the Spatio-Temporal Domain
    Husain, Farzad
    Dellen, Babette
    Torras, Carme
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 984 - 991
  • [10] Spatio-Temporal Encoder-Decoder Fully Convolutional Network for Video-Based Dimensional Emotion Recognition
    Du, Zhengyin
    Wu, Suowei
    Huang, Di
    Li, Weixin
    Wang, Yunhong
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2021, 12 (03) : 565 - 578