Multimodal Vision Transformers with Forced Attention for Behavior Analysis

被引:4
|
作者
Agrawal, Tanay [1 ]
Balazia, Michal [1 ]
Muller, Philipp [2 ]
Bremond, Francois [1 ]
机构
[1] INRIA, Valbonne, France
[2] DFKI, Saarbrucken, Germany
关键词
PERSONALITY; JUDGMENTS;
D O I
10.1109/WACV56688.2023.00339
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human behavior understanding requires looking at minute details in the large context of a scene containing multiple input modalities. It is necessary as it allows the design of more human-like machines. While transformer approaches have shown great improvements, they face multiple challenges such as lack of data or background noise. To tackle these, we introduce the Forced Attention (FAt) Transformer which utilize forced attention with a modified backbone for input encoding and a use of additional inputs. In addition to improving the performance on different tasks and inputs, the modification requires less time and memory resources. We provide a model for a generalised feature extraction for tasks concerning social signals and behavior analysis. Our focus is on understanding behavior in videos where people are interacting with each other or talking into the camera which simulates the first person point of view in social interaction. FAt Transformers are applied to two downstream tasks: personality recognition and body language recognition. We achieve state-of-the-art results for Udiva v0.5, First Impressions v2 and MPII Group Interaction datasets. We further provide an extensive ablation study of the proposed architecture.
引用
收藏
页码:3381 / 3391
页数:11
相关论文
共 50 条
  • [41] Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
    Wei, Cong
    Duke, Brendan
    Jiang, Ruowei
    Aarabi, Parham
    Taylor, Graham W.
    Shkurti, Florian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22680 - 22689
  • [42] AMixer: Adaptive Weight Mixing for Self-attention Free Vision Transformers
    Rao, Yongming
    Zhao, Wenliang
    Zhou, Jie
    Lu, Jiwen
    COMPUTER VISION, ECCV 2022, PT XXI, 2022, 13681 : 50 - 67
  • [43] TAQ: TOP-K ATTENTION-AWARE QUANTIZATION FOR VISION TRANSFORMERS
    Shi, Lili
    Huang, Haiduo
    Song, Bowei
    Tan, Meng
    Zhao, Wenzhe
    Xia, Tian
    Ren, Pengju
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1750 - 1754
  • [44] Vision Transformers with Cross-Attention Pyramids for Class-Agnostic Counting
    Jiban, Md Jibanul Haque
    Mahalanobis, Abhijit
    Lobo, Niels Da Vitoria
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 689 - 695
  • [45] Inheritance Attention Matrix-Based Universal Adversarial Perturbations on Vision Transformers
    Hu, Haoqi
    Lu, Xiaofeng
    Zhang, Xinpeng
    Zhang, Tianxing
    Sun, Guangling
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1923 - 1927
  • [46] ADA-VIT: ATTENTION-GUIDED DATA AUGMENTATION FOR VISION TRANSFORMERS
    Baili, Nada
    Frigui, Hichem
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 385 - 389
  • [47] Explainable hybrid vision transformers and convolutional network for multimodal glioma segmentation in brain MRI
    Zeineldin, Ramy A.
    Karar, Mohamed E.
    Elshaer, Ziad
    Coburger, Jan
    Wirtz, Christian R.
    Burgert, Oliver
    Mathis-Ullrich, Franziska
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [48] Brain encoding models based on multimodal transformers can transfer across language and vision
    Tang, Jerry
    Du, Meng
    Vo, Vy A.
    Lal, Vasudev
    Huth, Alexander G.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Explainable hybrid vision transformers and convolutional network for multimodal glioma segmentation in brain MRI
    Ramy A. Zeineldin
    Mohamed E. Karar
    Ziad Elshaer
    Jan Coburger
    Christian R. Wirtz
    Oliver Burgert
    Franziska Mathis-Ullrich
    Scientific Reports, 14
  • [50] Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism
    Argade, Dakshata
    Khairnar, Vaishali
    Vora, Deepali
    Patil, Shruti
    Kotecha, Ketan
    Alfarhood, Sultan
    HELIYON, 2024, 10 (04)