DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition

被引:0
|
作者
Yan Zhou
Xiao Ren
Jianxun Li
Yin Yang
Haibin Zhou
机构
[1] Xiangtan University,School of Automation and Electronic Information
[2] Shanghai Jiao Tong Universityty,School of Electronic Information and Electrical Engineering
[3] Xiangtan University,School of Mathematics and Computational Science
来源
关键词
Few-shot learning; Fine-grained image recognition; Attention mechanism; Cross-modal fusion; Prototype;
D O I
暂无
中图分类号
学科分类号
摘要
Since obtaining comprehensive labeled samples is expensive, the Fine-grained Few-shot Recognition task aims to identify unseen meta classes by using one or several labeled known meta classes. Besides, Fine-grained Recognition suffers some challenges such as minimal inter-class variation, backgrounds clutter, and most of the previous methods are single visual modality. In this paper, we propose a novel Dual Cross-modal Attention Network (DCMA-Net) to address the mentioned problems. Concretely, we first propose the Local Mutuality Attention branch that encodes contextual information by merging cross-modal information to learn more discriminatory information and increase inter-class differences. Meanwhile, we add a regularization mechanism to filter the visual features that match the attribute information to ensure the effectiveness of learning. Focusing on local features is easy to ignore instance information, so we propose the Global Correlation Attention branch which gains details activation representation acquired by global pooling of visual features serially in spatial and channel dimensions. This branch avoids learning bias as the counterpart of the Local Mutuality Attention branch. After that, both outputs of the two branches are aggregated as an integral feature embedding, which can be used to enhance the prototypes. Extensive experiments on CUB and SUN datasets demonstrate that our framework is effective. Particularly, our method has improved the accuracy of Prototype Network from 51.31 to 77.67 on 5-way 1-shot scenarios on the CUB dataset with Conv-4 backbone.
引用
收藏
页码:14521 / 14537
页数:16
相关论文
共 50 条
  • [1] DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition
    Zhou, Yan
    Ren, Xiao
    Li, Jianxun
    Yang, Yin
    Zhou, Haibin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 14521 - 14537
  • [2] Dual Attention Networks for Few-Shot Fine-Grained Recognition
    Xu, Shu-Lin
    Zhang, Faen
    Wei, Xiu-Shen
    Wang, Jianhua
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2911 - 2919
  • [3] A few-shot fine-grained image recognition method
    Wang, Jianwei
    Chen, Deyun
    [J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2023, 71 (01)
  • [4] Attentive fine-grained recognition for cross-domain few-shot classification
    Sa, Liangbing
    Yu, Chongchong
    Ma, Xianqin
    Zhao, Xia
    Xie, Tao
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (06): : 4733 - 4746
  • [5] Attentive fine-grained recognition for cross-domain few-shot classification
    Liangbing Sa
    Chongchong Yu
    Xianqin Ma
    Xia Zhao
    Tao Xie
    [J]. Neural Computing and Applications, 2022, 34 : 4733 - 4746
  • [6] Learning attention-guided pyramidal features for few-shot fine-grained recognition
    Tang, Hao
    Yuan, Chengcheng
    Li, Zechao
    Tang, Jinhui
    [J]. Pattern Recognition, 2022, 130
  • [7] Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition
    Zhu, Yaohui
    Liu, Chenlong
    Jiang, Shuqiang
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1090 - 1096
  • [8] Learning attention-guided pyramidal features for few-shot fine-grained recognition
    Tang, Hao
    Yuan, Chengcheng
    Li, Zechao
    Tang, Jinhui
    [J]. PATTERN RECOGNITION, 2022, 130
  • [9] Few-shot activity recognition with cross-modal memory network
    Zhang, Lingling
    Chang, Xiaojun
    Liu, Jun
    Luo, Minnan
    Prakash, Mahesh
    Hauptmann, Alexander G.
    [J]. PATTERN RECOGNITION, 2020, 108
  • [10] Bi-channel attention meta learning for few-shot fine-grained image recognition
    Wang, Yao
    Ji, Yang
    Wang, Wei
    Wang, Bailing
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242