MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引:0
|
作者
Madan, Surbhi [1 ]
Jain, Rishabh [1 ]
Sharma, Gulshan [1 ]
Subramanian, Ramanathan [2 ]
Dhall, Abhinav [1 ,3 ]
机构
[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India
[2] Univ Canberra, Canberra, ACT, Australia
[3] Monash Univ, Clayton, Vic, Australia
关键词
Bodily Behavior; Multiview Attention; DCT; Transformer;
D O I
10.1145/3581783.3612858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR
引用
收藏
页码:9526 / 9530
页数:5
相关论文
共 50 条
  • [41] Group Behavior Recognition Using Attention- and Graph-Based Neural Networks
    Yang, Fangkai
    Yin, Wenjie
    Inamura, Tetsunari
    Bjorkman, Marten
    Peters, Christopher
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1626 - 1633
  • [42] MSF-TransUNet: A Multi-Scale Feature Fusion Transformer-Based U-Net for Medical Image Segmentation with Uniform Attention
    Jiang, Ying
    Gong, Lejun
    Huang, Hao
    Qi, Mingming
    Traitement du Signal, 2025, 42 (01) : 531 - 540
  • [43] Multimodal Integration of Mel Spectrograms and Text Transcripts for Enhanced Automatic Speech Recognition: Leveraging Extractive Transformer-Based Approaches and Late Fusion Strategies
    Mehra, Sunakshi
    Ranga, Virender
    Agarwal, Ritu
    Computational Intelligence, 2024, 40 (06)
  • [44] CMACF: Transformer-based cross-modal attention cross-fusion model for systemic lupus erythematosus diagnosis combining Raman spectroscopy, FTIR spectroscopy, and metabolomics
    Zhou, Xuguang
    Chen, Chen
    Lv, Xiaoyi
    Zuo, Enguang
    Li, Min
    Wu, Lijun
    Chen, Xiaomei
    Wu, Xue
    Chen, Cheng
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (06)
  • [45] Group Non-Critical Behavior Recognition Based on Joint Attention Mechanism of Sensor Data and Semantic Domain
    Li, Chen
    Liu, Baoluo
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (07) : 608 - 617
  • [46] Dynamic graph convolutional network for assembly behavior recognition based on attention mechanism and multi-scale feature fusion
    Chen, Chengjun
    Zhao, Xicong
    Wang, Jinlei
    Li, Dongnian
    Guan, Yuanlin
    Hong, Jun
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [47] Dynamic graph convolutional network for assembly behavior recognition based on attention mechanism and multi-scale feature fusion
    Chengjun Chen
    Xicong Zhao
    Jinlei Wang
    Dongnian Li
    Yuanlin Guan
    Jun Hong
    Scientific Reports, 12
  • [48] Improving Seismic Fault Recognition with Self-Supervised Pre-Training: A Study of 3D Transformer-Based with Multi-Scale Decoding and Fusion
    Zhang, Zeren
    Chen, Ran
    Ma, Jinwen
    REMOTE SENSING, 2024, 16 (05)
  • [49] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
    Qi, Shubao
    Liu, Baolin
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1493 - 1503
  • [50] A multimodal fusion-based deep learning framework combined with keyframe extraction and spatial and channel attention for group emotion recognition from videos
    Shubao Qi
    Baolin Liu
    Pattern Analysis and Applications, 2023, 26 (3) : 1493 - 1503