MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引:0
|
作者
Madan, Surbhi [1 ]
Jain, Rishabh [1 ]
Sharma, Gulshan [1 ]
Subramanian, Ramanathan [2 ]
Dhall, Abhinav [1 ,3 ]
机构
[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India
[2] Univ Canberra, Canberra, ACT, Australia
[3] Monash Univ, Clayton, Vic, Australia
关键词
Bodily Behavior; Multiview Attention; DCT; Transformer;
D O I
10.1145/3581783.3612858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR
引用
收藏
页码:9526 / 9530
页数:5
相关论文
共 50 条
  • [21] A Transformer-based Multi-modal Joint Attention Fusion Model for Molecular Property Prediction
    Wang, Ke
    Zhang, Wei
    Liu, Yong
    Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023, 2023, : 4972 - 4974
  • [22] Transformer-based deep reverse attention network for multi-sensory human activity recognition
    Pramanik, Rishav
    Sikdar, Ritodeep
    Sarkar, Ram
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [23] Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition
    Fan, Zhaoxuan
    Zhao, Xu
    Lin, Tianwei
    Su, Haisheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 363 - 374
  • [24] Representation, Alignment, Fusion: A Generic Transformer-Based Framework for Multi-modal Glaucoma Recognition
    Zhou, You
    Yang, Gang
    Zhou, Yang
    Ding, Dayong
    Zhao, Jianchun
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VII, 2023, 14226 : 704 - 713
  • [25] MemoCMT: multimodal emotion recognition using cross-modal transformer-based feature fusion
    Mustaqeem Khan
    Phuong-Nam Tran
    Nhat Truong Pham
    Abdulmotaleb El Saddik
    Alice Othmani
    Scientific Reports, 15 (1)
  • [26] DSTM: A transformer-based model with dynamic-static feature fusion in speech emotion recognition
    Jin, Guowei
    Xu, Yunfeng
    Kang, Hong
    Wang, Jialin
    Miao, Borui
    Computer Speech and Language, 2025, 90
  • [27] Human behavior recognition based on sparse transformer with channel attention mechanism
    Cao, Keyan
    Wang, Mingrui
    FRONTIERS IN PHYSIOLOGY, 2023, 14
  • [28] A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos
    Koch, Jannik
    Wolf, Stefan
    Beyerer, Juergen
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 100 - 109
  • [29] Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild
    Alzamzami, Fatimah
    Saddik, Abdulmotaleb El
    IEEE ACCESS, 2023, 11 : 47070 - 47079
  • [30] Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
    Liang, Chengdong
    Xu, Menglong
    Zhang, Xiao-Lei
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 2 : 1495 - 1499