MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引：0

作者：

Madan, Surbhi ^{[1
]}

Jain, Rishabh ^{[1
]}

Sharma, Gulshan ^{[1
]}

Subramanian, Ramanathan ^{[2
]}

Dhall, Abhinav ^{[1
,3
]}

机构：

[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India

[2] Univ Canberra, Canberra, ACT, Australia

[3] Monash Univ, Clayton, Vic, Australia

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

Bodily Behavior; Multiview Attention; DCT; Transformer;

D O I：

10.1145/3581783.3612858

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR

引用

页码：9526 / 9530

页数：5

共 50 条

[1] Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos
Wu, Hanbo
Ma, Xin
Li, Yibin
Signal Processing: Image Communication, 2025, 131
[2] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Zhao, Chendong
Wang, Jianzong
Wei, Wenqi
Qu, Xiaoyang
Wang, Haoqian
Xiao, Jing
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
[3] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
Siriwardhana, Shamane
Kaluarachchi, Tharindu
Billinghurst, Mark
Nanayakkara, Suranga
IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
[4] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
Zhang, Zonghua (zhzhang@hebut.edu.cn), 2025, 620
[5] Fusion of Image-text attention for Transformer-based Multimodal Machine Translation
Ma, Junteng
Qin, Shihao
Su, Lan
Li, Xia
Xiao, Lixian
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 199 - 204
[6] TransVPR: Transformer-Based Place Recognition with Multi-Level Attention Aggregation
Wang, Ruotong
Shen, Yanqing
Zuo, Weiliang
Zhou, Sanping
Zheng, Nanning
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13638 - 13647
[7] Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition
Sun, Yaohui
Xu, Weiyao
Yu, Xiaoyi
Gao, Ju
Xia, Ting
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
[8] Integrating Vision Transformer-Based Bilinear Pooling and Attention Network Fusion of RGB and Skeleton Features for Human Action Recognition
Yaohui Sun
Weiyao Xu
Xiaoyi Yu
Ju Gao
Ting Xia
International Journal of Computational Intelligence Systems, 16
[9] Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion
Xie, Baijun
Sidulova, Mariia
Park, Chung Hyuk
SENSORS, 2021, 21 (14)
[10] A Transformer-Based Unsupervised Domain Adaptation Method for Skeleton Behavior Recognition
Yan, Qiuyan
Hu, Yan
IEEE ACCESS, 2023, 11 : 51689 - 51700

← 1 2 3 4 5 →