MA-VLAD: a fine-grained local feature aggregation scheme for action recognition

被引：2

作者：

Feng, Na ^{[1
]}

Tang, Ying ^{[1
]}

Song, Zikai ^{[1
]}

Yu, Junqing ^{[1
]}

Chen, Yi-Ping Phoebe ^{[2
]}

Yang, Wei ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[2] La Trobe Univ, Dept Comp Sci & Informat Technol, Bundoora, Vic 3086, Australia

来源：

MULTIMEDIA SYSTEMS | 2024年 / 30卷 / 03期

关键词：

VLAD; Local feature aggregation; Attention; Action recognition;

D O I：

10.1007/s00530-024-01341-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A recent trend in action recognition involves aggregating local features into a more compact representation to eliminate redundancy in video features while retaining essential components for recognition. An exemplary approach is NetVLAD and its variations, which learn cluster centers for local features and represent them as VLAD descriptors. However, these methods process multi-frame features in a generic and straightforward manner, while overlooking the intricate semantic shifts within consecutive frames. More specifically, they fail to acknowledge that a pivotal aspect of events/actions is the local dynamics of semantic entities. In this paper, we propose Multi-head Attention Modularized VLAD (MA-VLAD) for fine-grained semantic-inclination clustering of features, enhancing VLAD descriptors with a strong local focusing capability. Specifically, we utilize a multi-head mechanism to partition the input features along the channel dimension, and integrate it with the attention mechanism to conduct fine-grained clustering. Additionally, to consolidate temporal information for enhanced recognition, we utilize temporal position embeddings to address order-sensitive events/actions. Our MA-VLAD delivers more dependable video representations than some of the most widely used and potent methods. Extensive experiments on UCF101, HMDB51, and SoccerNet-v2 datasets demonstrate that our MA-VLAD achieves state-of-the-art performance, underscoring its effectiveness.

引用

页数：13

共 50 条

[1] Deep convolutional feature aggregation for fine-grained cultivar recognition
Wu, Hao
Fang, Lincong
Yu, Qian
Yang, Chengzhuan
KNOWLEDGE-BASED SYSTEMS, 2023, 275
[2] Fine-Grained Recognition via Attribute-Guided Attentive Feature Aggregation
Yan, Yichao
Ni, Bingbing
Yang, Xiaokang
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1032 - 1040
[3] Few-shot fine-grained recognition in remote sensing ship images with global and local feature aggregation
Zhou, Guoqing
Huang, Liang
Zhang, Xianfeng
ADVANCES IN SPACE RESEARCH, 2024, 74 (08) : 3735 - 3748
[4] Fine-Grained Crowdsourcing for Fine-Grained Recognition
Jia Deng
Krause, Jonathan
Li Fei-Fei
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
[5] TaiChi: A Fine-Grained Action Recognition Dataset
Sun, Shan
Wang, Feng
Liang, Qi
He, Liang
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 434 - 438
[6] Fine-Grained Obfuscation Scheme Recognition on Binary Code
Tian, Zhenzhou
Mao, Hengchao
Huang, Yaqian
Tian, Jie
Li, Jinrui
DIGITAL FORENSICS AND CYBER CRIME, ICDF2C 2021, 2022, 441 : 215 - 228
[7] Learning Convolutional Action Primitives for Fine-grained Action Recognition
Lea, Colin
Vidal, Rene
Hager, Gregory D.
2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 1642 - 1649
[8] Fine-grained Action Recognition using Attribute Vectors
Yenduri, Sravani
Perveen, Nazil
Chalavadi, Vishnu
Mohan, C. Krishna
PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 134 - 143
[9] Convolutional transformer network for fine-grained action recognition
Ma, Yujun
Wang, Ruili
Zong, Ming
Ji, Wanting
Wang, Yi
Ye, Baoliu
NEUROCOMPUTING, 2024, 569
[10] FINE-GRAINED ACTION RECOGNITION ON A NOVEL BASKETBALL DATASET
Gu, Xiaofan
Xue, Xinwei
Wang, Feng
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2563 - 2567

← 1 2 3 4 5 →