Cross-modal fusion for multi-label image classification with attention mechanism

被引：16

作者：

Wang, Yangtao ^{[1
]}

Xie, Yanzhao ^{[2
]}

Zeng, Jiangfeng ^{[3
]}

Wang, Hanpin ^{[1
]}

Fan, Lisheng ^{[1
]}

Song, Yufan ^{[4
]}

机构：

[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China

[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China

[3] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2022年 / 101卷

基金：

中国国家自然科学基金;

关键词：

Graph convolution network; Attention mechanism; Cross-modal fusion; Multi-label image classification;

D O I：

10.1016/j.compeleceng.2022.108002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

For multi-label image classification, existing studies either utilize a poor multi-step training workflow to explore the (local) relationships between the image target regions and their corresponding labels with attention mechanism or model the (global) label dependencies via graph convolution network (GCN) but fail to efficiently fuse these image features and label word vectors. To address these problems, we develop Cross-modal Fusion for Multi-label Image Classification with attention mechanism (termed as CFMIC), which combines attention mechanism and GCN to capture the local and global label dependencies simultaneously in an end-to-end manner. CFMIC mainly contains three key modules: (1) a feature extraction module with attention mechanism which helps generate the accurate feature of each input image by focusing on the relationships between image labels and image target regions, (2) a label co occurrence embedding learning module with GCN which utilizes GCN to learn the relationships between different objects to generate the label co-occurrence embeddings and (3) a cross-modal fusion module with Multi-modal Factorized Bilinear pooling (termed as MFB) which efficiently fuses the above image features and label co-occurrence embeddings. Extensive experiments on MS-COCO and VOC2007 verify CFMIC greatly promotes the convergence efficiency and produces better classification results than the state-of-the-art approaches.

引用

页数：12

共 50 条

[21] Cross-modal attention for multi-modal image registration
Song, Xinrui
Chao, Hanqing
Xu, Xuanang
Guo, Hengtao
Xu, Sheng
Turkbey, Baris
Wood, Bradford J.
Sanford, Thomas
Wang, Ge
Yan, Pingkun
MEDICAL IMAGE ANALYSIS, 2022, 82
[22] Multi-label Thoracic Disease Image Classification with Cross-Attention Networks
Ma, Congbo
Wang, Hu
Hoi, Steven C. H.
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT VI, 2019, 11769 : 730 - 738
[23] Multi-Label Image Classification by Feature Attention Network
Yan, Zheng
Liu, Weiwei
Wen, Shiping
Yang, Yin
IEEE ACCESS, 2019, 7 : 98005 - 98013
[24] Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
Zou, Xitao
Wu, Song
Zhang, Nian
Bakker, Erwin M.
Knowledge-Based Systems, 2022, 239
[25] Multi-label modality enhanced attention based self-supervised deep cross-modal hashing
Zou, Xitao
Wu, Song
Zhang, Nian
Bakker, Erwin M.
KNOWLEDGE-BASED SYSTEMS, 2022, 239
[26] Cross-modal image fusion guided by subjective visual attention
Fang, Aiqing
Zhao, Xinbo
Zhang, Yanning
NEUROCOMPUTING, 2020, 414 (414) : 333 - 345
[27] Multi-label adversarial fine-grained cross-modal retrieval
Sun, Chunpu
Zhang, Huaxiang
Liu, Li
Liu, Dongmei
Wang, Lin
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
[28] Deep Noisy Multi-label Learning for Robust Cross-Modal Retrieval
Pu, Ruitao
Peng, Dezhong
Hua, Fujun
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 304 - 317
[29] DEEP PAIRWISE RANKING WITH MULTI-LABEL INFORMATION FOR CROSS-MODAL RETRIEVAL
Jian, Yangwo
Xiao, Jing
Cao, Yang
Khan, Asad
Zhu, Jia
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1810 - 1815
[30] Multi-label semantics preserving based deep cross-modal hashing
Zou, Xitao
Wang, Xinzhi
Bakker, Erwin M.
Wu, Song
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 93

← 1 2 3 4 5 →