MA-VLAD: a fine-grained local feature aggregation scheme for action recognition

被引:2
|
作者
Feng, Na [1 ]
Tang, Ying [1 ]
Song, Zikai [1 ]
Yu, Junqing [1 ]
Chen, Yi-Ping Phoebe [2 ]
Yang, Wei [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] La Trobe Univ, Dept Comp Sci & Informat Technol, Bundoora, Vic 3086, Australia
关键词
VLAD; Local feature aggregation; Attention; Action recognition;
D O I
10.1007/s00530-024-01341-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A recent trend in action recognition involves aggregating local features into a more compact representation to eliminate redundancy in video features while retaining essential components for recognition. An exemplary approach is NetVLAD and its variations, which learn cluster centers for local features and represent them as VLAD descriptors. However, these methods process multi-frame features in a generic and straightforward manner, while overlooking the intricate semantic shifts within consecutive frames. More specifically, they fail to acknowledge that a pivotal aspect of events/actions is the local dynamics of semantic entities. In this paper, we propose Multi-head Attention Modularized VLAD (MA-VLAD) for fine-grained semantic-inclination clustering of features, enhancing VLAD descriptors with a strong local focusing capability. Specifically, we utilize a multi-head mechanism to partition the input features along the channel dimension, and integrate it with the attention mechanism to conduct fine-grained clustering. Additionally, to consolidate temporal information for enhanced recognition, we utilize temporal position embeddings to address order-sensitive events/actions. Our MA-VLAD delivers more dependable video representations than some of the most widely used and potent methods. Extensive experiments on UCF101, HMDB51, and SoccerNet-v2 datasets demonstrate that our MA-VLAD achieves state-of-the-art performance, underscoring its effectiveness.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Deep convolutional feature aggregation for fine-grained cultivar recognition
    Wu, Hao
    Fang, Lincong
    Yu, Qian
    Yang, Chengzhuan
    KNOWLEDGE-BASED SYSTEMS, 2023, 275
  • [2] Fine-Grained Recognition via Attribute-Guided Attentive Feature Aggregation
    Yan, Yichao
    Ni, Bingbing
    Yang, Xiaokang
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1032 - 1040
  • [3] Few-shot fine-grained recognition in remote sensing ship images with global and local feature aggregation
    Zhou, Guoqing
    Huang, Liang
    Zhang, Xianfeng
    ADVANCES IN SPACE RESEARCH, 2024, 74 (08) : 3735 - 3748
  • [4] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [5] TaiChi: A Fine-Grained Action Recognition Dataset
    Sun, Shan
    Wang, Feng
    Liang, Qi
    He, Liang
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 434 - 438
  • [6] Fine-Grained Obfuscation Scheme Recognition on Binary Code
    Tian, Zhenzhou
    Mao, Hengchao
    Huang, Yaqian
    Tian, Jie
    Li, Jinrui
    DIGITAL FORENSICS AND CYBER CRIME, ICDF2C 2021, 2022, 441 : 215 - 228
  • [7] Learning Convolutional Action Primitives for Fine-grained Action Recognition
    Lea, Colin
    Vidal, Rene
    Hager, Gregory D.
    2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 1642 - 1649
  • [8] Fine-grained Action Recognition using Attribute Vectors
    Yenduri, Sravani
    Perveen, Nazil
    Chalavadi, Vishnu
    Mohan, C. Krishna
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 134 - 143
  • [9] Convolutional transformer network for fine-grained action recognition
    Ma, Yujun
    Wang, Ruili
    Zong, Ming
    Ji, Wanting
    Wang, Yi
    Ye, Baoliu
    NEUROCOMPUTING, 2024, 569
  • [10] FINE-GRAINED ACTION RECOGNITION ON A NOVEL BASKETBALL DATASET
    Gu, Xiaofan
    Xue, Xinwei
    Wang, Feng
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 2563 - 2567