Explainable Driver Activity Recognition Using Video Transformer in Highly Automated Vehicle

被引:1
|
作者
Sonth, Akash [1 ]
Sarkar, Abhijit [2 ]
Bhagat, Hirva [3 ]
Abbott, Lynn [1 ]
机构
[1] Virginia Tech, Bradley Dept Elect & Comp Engn, Blacksburg, VA 24061 USA
[2] Virginia Tech Transportat Inst, Blacksburg, VA USA
[3] Virginia Tech, Dept Comp Sci, Blacksburg, VA USA
关键词
transformers; action recognition; explainable AI; visual dictionary;
D O I
10.1109/IV55152.2023.10186584
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distracted driving is one of the leading causes of road accidents. With the recent introduction of advanced driver assistance systems and L2 vehicles, the role of driver attention has gained renewed interest. It is imperative for vehicle manufacturers to develop robust systems that can identify distractions and aid in preventing such accidents in highly automated vehicles. This paper mainly focuses on studying secondary behaviors, and their relative complexity to develop a guide for auto manufacturers. In recent years, a few driver secondary action datasets and deep learning algorithms have been created to address this problem. Despite their success in many domains, Convolutional Neural Network based deep learning methods struggle to fully consider the overall context of an image, and focus on specific image features. We present the use of Video Transformers on two challenging datasets, one of them being a grayscale low-quality dataset. We also demonstrate how the novel concept of a Visual Dictionary can be used to understand the structural components of any secondary behavior. Finally, we validate different components of the visual dictionary by studying the attention modules of the transformer-based model and incorporating explainability in the computer vision model. An activity is decomposed into multiple small actions and attributes and the corresponding attention patches are highlighted in the input frame. Our code is available at github.com/VTTI/driver-secondary-action-recognition
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Human Activity Recognition for Video Surveillance using Sequences of Postures
    Htike, Kyaw Kyaw
    Khalifa, Othman O.
    Ramli, Huda Adibah Mohd
    Abushariah, Mohammad A. M.
    2014 THIRD INTERNATIONAL CONFERENCE ON E-TECHNOLOGIES AND NETWORKS FOR DEVELOPMENT (ICEND), 2014, : 79 - 82
  • [42] Abnormal Activity Detection Using Video Action Recognition: A Review
    Ojha, Abhushan
    Liu, Ying
    Hao, Yu
    2024 6TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING, ICNLP 2024, 2024, : 474 - 483
  • [43] Activity recognition from video sequences using declarative models
    Rota, NA
    Thonnat, M
    ECAI 2000: 14TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, 54 : 673 - 677
  • [44] Activity Recognition using Video Event Segmentation with Text (VEST)
    Holloway, Hillary
    Jones, Eric K.
    Kaluzniacki, Andrew
    Blasch, Erik
    Tierno, Jorge
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXIII, 2014, 9091
  • [45] Automated Video Behavior Recognition of Pigs Using Two-Stream Convolutional Networks
    Zhang, Kaifeng
    Li, Dan
    Huang, Jiayun
    Chen, Yifei
    SENSORS, 2020, 20 (04)
  • [46] Automated video analysis for action recognition using descriptors derived from optical acceleration
    Anitha Edison
    C. V. Jiji
    Signal, Image and Video Processing, 2019, 13 : 915 - 922
  • [47] Automated video analysis for action recognition using descriptors derived from optical acceleration
    Edison, Anitha
    Jiji, C. V.
    SIGNAL IMAGE AND VIDEO PROCESSING, 2019, 13 (05) : 915 - 922
  • [48] Measuring coupled oscillations using an automated video analysis technique based on image recognition
    Monsoriu, JA
    Giménez, MH
    Riera, J
    Vidaurre, A
    EUROPEAN JOURNAL OF PHYSICS, 2005, 26 (06) : 1149 - 1155
  • [49] Instructional Activity Recognition Using A Transformer Network with Multi-Semantic Attention
    Korban, Matthew
    Acton, Scott T.
    Youngs, Peter
    Foster, Jonathan
    2024 IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION, SSIAI, 2024, : 113 - 116
  • [50] Using Large Language Models to Compare Explainable Models for Smart Home Human Activity Recognition
    Fiori, Michele
    Civitarese, Gabriele
    Bettini, Claudio
    COMPANION OF THE 2024 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, UBICOMP COMPANION 2024, 2024, : 881 - 884