DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition

被引：0

作者：

Yan Zhou

Xiao Ren

Jianxun Li

Yin Yang

Haibin Zhou

机构：

[1] Xiangtan University,School of Automation and Electronic Information

[2] Shanghai Jiao Tong Universityty,School of Electronic Information and Electrical Engineering

[3] Xiangtan University,School of Mathematics and Computational Science

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Few-shot learning; Fine-grained image recognition; Attention mechanism; Cross-modal fusion; Prototype;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Since obtaining comprehensive labeled samples is expensive, the Fine-grained Few-shot Recognition task aims to identify unseen meta classes by using one or several labeled known meta classes. Besides, Fine-grained Recognition suffers some challenges such as minimal inter-class variation, backgrounds clutter, and most of the previous methods are single visual modality. In this paper, we propose a novel Dual Cross-modal Attention Network (DCMA-Net) to address the mentioned problems. Concretely, we first propose the Local Mutuality Attention branch that encodes contextual information by merging cross-modal information to learn more discriminatory information and increase inter-class differences. Meanwhile, we add a regularization mechanism to filter the visual features that match the attribute information to ensure the effectiveness of learning. Focusing on local features is easy to ignore instance information, so we propose the Global Correlation Attention branch which gains details activation representation acquired by global pooling of visual features serially in spatial and channel dimensions. This branch avoids learning bias as the counterpart of the Local Mutuality Attention branch. After that, both outputs of the two branches are aggregated as an integral feature embedding, which can be used to enhance the prototypes. Extensive experiments on CUB and SUN datasets demonstrate that our framework is effective. Particularly, our method has improved the accuracy of Prototype Network from 51.31 to 77.67 on 5-way 1-shot scenarios on the CUB dataset with Conv-4 backbone.

引用

页码：14521 / 14537

页数：16

共 50 条

[1] DCMA-Net: dual cross-modal attention for fine-grained few-shot recognition
Zhou, Yan
Ren, Xiao
Li, Jianxun
Yang, Yin
Zhou, Haibin
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 14521 - 14537
[2] Dual Attention Networks for Few-Shot Fine-Grained Recognition
Xu, Shu-Lin
Zhang, Faen
Wei, Xiu-Shen
Wang, Jianhua
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2911 - 2919
[3] A few-shot fine-grained image recognition method
Wang, Jianwei
Chen, Deyun
[J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2023, 71 (01)
[4] Attentive fine-grained recognition for cross-domain few-shot classification
Sa, Liangbing
Yu, Chongchong
Ma, Xianqin
Zhao, Xia
Xie, Tao
[J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (06): : 4733 - 4746
[5] Attentive fine-grained recognition for cross-domain few-shot classification
Liangbing Sa
Chongchong Yu
Xianqin Ma
Xia Zhao
Tao Xie
[J]. Neural Computing and Applications, 2022, 34 : 4733 - 4746
[6] Learning attention-guided pyramidal features for few-shot fine-grained recognition
Tang, Hao
Yuan, Chengcheng
Li, Zechao
Tang, Jinhui
[J]. Pattern Recognition, 2022, 130
[7] Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition
Zhu, Yaohui
Liu, Chenlong
Jiang, Shuqiang
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1090 - 1096
[8] Learning attention-guided pyramidal features for few-shot fine-grained recognition
Tang, Hao
Yuan, Chengcheng
Li, Zechao
Tang, Jinhui
[J]. PATTERN RECOGNITION, 2022, 130
[9] Few-shot activity recognition with cross-modal memory network
Zhang, Lingling
Chang, Xiaojun
Liu, Jun
Luo, Minnan
Prakash, Mahesh
Hauptmann, Alexander G.
[J]. PATTERN RECOGNITION, 2020, 108
[10] Bi-channel attention meta learning for few-shot fine-grained image recognition
Wang, Yao
Ji, Yang
Wang, Wei
Wang, Bailing
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242

← 1 2 3 4 5 →