MAGIC-TBR: Multiview Attention Fusion for Transformer-based Bodily Behavior Recognition in Group Settings

被引：0

作者：

Madan, Surbhi ^{[1
]}

Jain, Rishabh ^{[1
]}

Sharma, Gulshan ^{[1
]}

Subramanian, Ramanathan ^{[2
]}

Dhall, Abhinav ^{[1
,3
]}

机构：

[1] Indian Inst Technol Ropar, Rupnagar, Punjab, India

[2] Univ Canberra, Canberra, ACT, Australia

[3] Monash Univ, Clayton, Vic, Australia

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

Bodily Behavior; Multiview Attention; DCT; Transformer;

D O I：

10.1145/3581783.3612858

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bodily behavioral language is an important social cue, and its automated analysis helps in enhancing the understanding of artificial intelligence systems. Furthermore, behavioral language cues are essential for active engagement in social agent-based user interactions. Despite the progress made in computer vision for tasks like head and body pose estimation, there is still a need to explore the detection of finer behaviors such as gesturing, grooming, or fumbling. This paper proposes a multiview attention fusion method named MAGIC-TBR that combines features extracted from videos and their corresponding Discrete Cosine Transform coefficients via a transformer-based approach. The experiments are conducted on the BBSI dataset and the results demonstrate the effectiveness of the proposed feature fusion with multiview attention. The code is available at: https://github.com/surbhimadan92/MAGIC- TBR

引用

页码：9526 / 9530

页数：5

共 50 条

[31] Multiview Feature Fusion Attention Convolutional Recurrent Neural Networks for EEG-Based Emotion Recognition
Xin, Ruihao
Miao, Fengbo
Cong, Ping
Zhang, Fan
Xin, Yongxian
Feng, Xin
JOURNAL OF SENSORS, 2023, 2023
[32] PlaceFormer: Transformer-Based Visual Place Recognition Using Multi-Scale Patch Selection and Fusion
Kannan, Shyam Sundar
Min, Byung-Cheol
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6552 - 6559
[33] A Transformer-Based Image-Guided Depth-Completion Model with Dual-Attention Fusion Module
Wang, Shuling
Jiang, Fengze
Gong, Xiaojin
Sensors, 2024, 24 (19)
[34] Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
Lohrenz, Timo
Li, Zhengyang
Fingscheidt, Tim
INTERSPEECH 2021, 2021, : 2846 - 2850
[35] Multi-Label Multimodal Emotion Recognition With Transformer-Based Fusion and Emotion-Level Representation Learning
Le, Hoai-Duy
Lee, Guee-Sang
Kim, Soo-Hyung
Kim, Seungwon
Yang, Hyung-Jeong
IEEE ACCESS, 2023, 11 : 14742 - 14751
[36] Worker behavior recognition based on temporal and spatial self-attention of vision Transformer
Lu Y.-X.
Xu G.-H.
Tang B.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (03): : 446 - 454
[37] Cattle behavior recognition based on feature fusion under a dual attention mechanism
Shang, Cheng
Wu, Feng
Wang, MeiLi
Gao, Qiang
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 85
[38] ViT-LLMR: Vision Transformer-based lower limb motion recognition from fusion signals of MMG and IMU
Zhang, Hanyang
Yang, Ke
Cao, Gangsheng
Xia, Chunming
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 82
[39] Transformer-based Self-supervised Representation Learning for Emotion Recognition Using Bio-signal Feature Fusion
Sawant, Shrutika S.
Erick, F. X.
Arora, Pulkit
Pahl, Jaspar
Foltyn, Andreas
Holzer, Nina
Gotz, Theresa
2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
[40] Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
Liu, Pengfei
Li, Kun
Meng, Helen
INTERSPEECH 2020, 2020, : 379 - 383

← 1 2 3 4 5 →