MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引:0
|
作者
Madan, Surbhi [1 ]
Jain, Rishabh [1 ]
Sharma, Gulshan [1 ]
Subramanian, Ramanathan [2 ]
Dhall, Abhinav [1 ,3 ]
机构
[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India
[2] Univ Canberra, Canberra, ACT, Australia
[3] Monash Univ, Clayton, Vic, Australia
关键词
Bodily Behavior; Multiview Attention; DCT; Transformer;
D O I
10.1145/3581783.3612858
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR
引用
收藏
页码:9526 / 9530
页数:5
相关论文
共 50 条
  • [1] Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    Signal Processing: Image Communication, 2025, 131
  • [2] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
    Zhao, Chendong
    Wang, Jianzong
    Wei, Wenqi
    Qu, Xiaoyang
    Wang, Haoqian
    Xiao, Jing
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
  • [3] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
    Siriwardhana, Shamane
    Kaluarachchi, Tharindu
    Billinghurst, Mark
    Nanayakkara, Suranga
    IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
  • [5] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
    Ma, Junteng
    Qin, Shihao
    Su, Lan
    Li, Xia
    Xiao, Lixian
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
  • [6] TransVPR: Transformer-Based Place Recognition with Multi-Level Attention Aggregation
    Wang, Ruotong
    Shen, Yanqing
    Zuo, Weiliang
    Zhou, Sanping
    Zheng, Nanning
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13638 - 13647
  • [7] Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition
    Sun, Yaohui
    Xu, Weiyao
    Yu, Xiaoyi
    Gao, Ju
    Xia, Ting
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [8] Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition
    Yaohui Sun
    Weiyao Xu
    Xiaoyi Yu
    Ju Gao
    Ting Xia
    International Journal of Computational Intelligence Systems, 16
  • [9] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
    Xie, Baijun
    Sidulova, Mariia
    Park, Chung Hyuk
    SENSORS, 2021, 21 (14)
  • [10] A Transformer-Based Unsupervised Domain Adaptation Method for Skeleton Behavior Recognition
    Yan, Qiuyan
    Hu, Yan
    IEEE ACCESS, 2023, 11 : 51689 - 51700