Cross-modal fusion for multi-label image classification with attention mechanism

被引：16

作者：

Wang, Yangtao ^{[1
]}

Xie, Yanzhao ^{[2
]}

Zeng, Jiangfeng ^{[3
]}

Wang, Hanpin ^{[1
]}

Fan, Lisheng ^{[1
]}

Song, Yufan ^{[4
]}

机构：

[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China

[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China

[3] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2022年 / 101卷

基金：

中国国家自然科学基金;

关键词：

Graph convolution network; Attention mechanism; Cross-modal fusion; Multi-label image classification;

D O I：

10.1016/j.compeleceng.2022.108002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

For multi-label image classification, existing studies either utilize a poor multi-step training workflow to explore the (local) relationships between the image target regions and their corresponding labels with attention mechanism or model the (global) label dependencies via graph convolution network (GCN) but fail to efficiently fuse these image features and label word vectors. To address these problems, we develop Cross-modal Fusion for Multi-label Image Classification with attention mechanism (termed as CFMIC), which combines attention mechanism and GCN to capture the local and global label dependencies simultaneously in an end-to-end manner. CFMIC mainly contains three key modules: (1) a feature extraction module with attention mechanism which helps generate the accurate feature of each input image by focusing on the relationships between image labels and image target regions, (2) a label co occurrence embedding learning module with GCN which utilizes GCN to learn the relationships between different objects to generate the label co-occurrence embeddings and (3) a cross-modal fusion module with Multi-modal Factorized Bilinear pooling (termed as MFB) which efficiently fuses the above image features and label co-occurrence embeddings. Extensive experiments on MS-COCO and VOC2007 verify CFMIC greatly promotes the convergence efficiency and produces better classification results than the state-of-the-art approaches.

引用

页数：12

共 50 条

[31] Multi-label double-layer learning for cross-modal retrieval
He, Jianfeng
Ma, Bingpeng
Wang, Shuhui
Liu, Yugui
Huang, Qingming
NEUROCOMPUTING, 2018, 275 : 1893 - 1902
[32] Adaptive multi-label structure preserving network for cross-modal retrieval
Zhu, Jie
Zhang, Hui
Chen, Junfen
Xie, Bojun
Liu, Jianan
Zhang, Junsan
INFORMATION SCIENCES, 2024, 682
[33] Scalable multi-label canonical correlation analysis for cross-modal retrieval
Shu, Xin
Zhao, Guoying
PATTERN RECOGNITION, 2021, 115
[34] Bi-Modal Learning With Channel-Wise Attention for Multi-Label Image Classification
Li, Peng
Chen, Peng
Xie, Yonghong
Zhang, Dezheng
IEEE ACCESS, 2020, 8 : 9965 - 9977
[35] Graph Attention Transformer Network for Multi-label Image Classification
Yuan, Jin
Chen, Shikai
Zhang, Yao
Shi, Zhongchao
Geng, Xin
Fan, Jianping
Rui, Yong
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
[36] DATran: Dual Attention Transformer for Multi-Label Image Classification
Zhou, Wei
Zheng, Zhijie
Su, Tao
Hu, Haifeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 342 - 356
[37] Pose Guided Attention for Multi-label Fashion Image Classification
Ferreira, Beatriz Quintino
Costeira, Joao P.
Sousa, Ricardo G.
Gui, Liang-Yan
Gomes, Joao P.
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3125 - 3128
[38] Deep Class-Guided Hashing for Multi-Label Cross-Modal Retrieval
Chen, Hao
Zou, Zhuoyang
Liu, Yiqiang
Zhu, Xinghui
APPLIED SCIENCES-BASEL, 2025, 15 (06):
[39] Deep robust multilevel semantic hashing for multi-label cross-modal retrieval
Song, Ge
Tan, Xiaoyang
Zhao, Jun
Yang, Ming
PATTERN RECOGNITION, 2021, 120
[40] Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval
Qian, Shengsheng
Xue, Dizhan
Zhang, Huaiwen
Fang, Quan
Xu, Changsheng
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2440 - 2448

← 1 2 3 4 5 →