Multimodal Vision Transformers with Forced Attention for Behavior Analysis

被引:4
|
作者
Agrawal, Tanay [1 ]
Balazia, Michal [1 ]
Muller, Philipp [2 ]
Bremond, Francois [1 ]
机构
[1] INRIA, Valbonne, France
[2] DFKI, Saarbrucken, Germany
关键词
PERSONALITY; JUDGMENTS;
D O I
10.1109/WACV56688.2023.00339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transformer which utilize forced attention with a modified backbone for input encoding and a use of additional inputs. In addition to improving the performance on different tasks and inputs, the modification requires less time and memory resources. We provide a model for a generalised feature extraction for tasks concerning social signals and behavior analysis. Our focus is on understanding behavior in videos where people are interacting with each other or talking into the camera which simulates the first person point of view in social interaction. FAt Transformers are applied to two downstream tasks: personality recognition and body language recognition. We achieve state-of-the-art results for Udiva v0.5, First Impressions v2 and MPII Group Interaction datasets. We further provide an extensive ablation study of the proposed architecture.
引用
收藏
页码:3381 / 3391
页数:11
相关论文
共 50 条
  • [31] How Does Attention Work in Vision Transformers? A Visual Analytics Attempt
    Li, Yiran
    Wang, Junpeng
    Dai, Xin
    Wang, Liang
    Yeh, Chin-Chia Michael
    Zheng, Yan
    Zhang, Wei
    Ma, Kwan-Liu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (06) : 2888 - 2900
  • [32] Enhanced astronomical source classification with integration of attention mechanisms and vision transformers
    Bhavanam, Srinadh Reddy
    Channappayya, Sumohana S.
    Srijith, P. K.
    Desai, Shantanu
    ASTROPHYSICS AND SPACE SCIENCE, 2024, 369 (08)
  • [33] Introducing Attention Mechanism for EEG Signals: Emotion Recognition with Vision Transformers
    Arjun
    Rajpoot, Aniket Singh
    Panicker, Mahesh Raveendranatha
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 5723 - 5726
  • [34] You Only Need Less Attention at Each Stage in Vision Transformers
    Zhang, Shuoxi
    Liu, Hanpeng
    Lin, Stephen
    He, Kun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6057 - 6066
  • [35] VSA: Learning Varied-Size Window Attention in Vision Transformers
    Zhang, Qiming
    Xu, Yufei
    Zhang, Jing
    Tao, Dacheng
    COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 466 - 483
  • [36] ReViT: Enhancing vision transformers feature diversity with attention residual connections
    Diko, Anxhelo
    Avola, Danilo
    Cascio, Marco
    Cinque, Luigi
    PATTERN RECOGNITION, 2024, 156
  • [37] Enhancing Cross-Language Multimodal Emotion Recognition With Dual Attention Transformers
    Zaidi, Syed Aun Muhammad
    Latif, Siddique
    Qadir, Junaid
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2024, 5 : 684 - 693
  • [38] A Multidimensional Analysis of Social Biases in Vision Transformers
    Brinkmann, Jannik
    Swoboda, Paul
    Bartelt, Christian
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4891 - 4900
  • [39] Soft Error Reliability Analysis of Vision Transformers
    Xue, Xinghua
    Liu, Cheng
    Wang, Ying
    Yang, Bing
    Luo, Tao
    Zhang, Lei
    Li, Huawei
    Li, Xiaowei
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (12) : 2126 - 2136
  • [40] PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers
    Grainger, Ryan
    Paniagua, Thomas
    Song, Xi
    Cuntoor, Naresh
    Lee, Mun Wai
    Wu, Tianfu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18568 - 18578