Cross-modal fusion for multi-label image classification with attention mechanism

被引：16

作者：

Wang, Yangtao ^{[1
]}

Xie, Yanzhao ^{[2
]}

Zeng, Jiangfeng ^{[3
]}

Wang, Hanpin ^{[1
]}

Fan, Lisheng ^{[1
]}

Song, Yufan ^{[4
]}

机构：

[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China

[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China

[3] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2022年 / 101卷

基金：

中国国家自然科学基金;

关键词：

Graph convolution network; Attention mechanism; Cross-modal fusion; Multi-label image classification;

D O I：

10.1016/j.compeleceng.2022.108002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

For multi-label image classification, existing studies either utilize a poor multi-step training workflow to explore the (local) relationships between the image target regions and their corresponding labels with attention mechanism or model the (global) label dependencies via graph convolution network (GCN) but fail to efficiently fuse these image features and label word vectors. To address these problems, we develop Cross-modal Fusion for Multi-label Image Classification with attention mechanism (termed as CFMIC), which combines attention mechanism and GCN to capture the local and global label dependencies simultaneously in an end-to-end manner. CFMIC mainly contains three key modules: (1) a feature extraction module with attention mechanism which helps generate the accurate feature of each input image by focusing on the relationships between image labels and image target regions, (2) a label co occurrence embedding learning module with GCN which utilizes GCN to learn the relationships between different objects to generate the label co-occurrence embeddings and (3) a cross-modal fusion module with Multi-modal Factorized Bilinear pooling (termed as MFB) which efficiently fuses the above image features and label co-occurrence embeddings. Extensive experiments on MS-COCO and VOC2007 verify CFMIC greatly promotes the convergence efficiency and produces better classification results than the state-of-the-art approaches.

引用

页数：12

共 50 条

[1] Cross-modal fusion for multi-label image classification with attention mechanism
Wang, Yangtao
Xie, Yanzhao
Zeng, Jiangfeng
Wang, Hanpin
Fan, Lisheng
Song, Yufan
Computers and Electrical Engineering, 2022, 101
[2] Label-Guided Cross-Modal Attention Network for Multi-Label Aerial Image Classification
Chen, Ying
Zhang, Ding
Han, Tao
Meng, Xiaoliang
Gao, Mianxin
Wang, Teng
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[3] Multi-Scale Cross-Modal Spatial Attention Fusion for Multi-label Image Recognition
Li, Junbing
Zhang, Changqing
Wang, Xueman
Du, Ling
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 736 - 747
[4] Label graph learning for multi-label image recognition with cross-modal fusion
Yanzhao Xie
Yangtao Wang
Yu Liu
Ke Zhou
Multimedia Tools and Applications, 2022, 81 : 25363 - 25381
[5] Label graph learning for multi-label image recognition with cross-modal fusion
Xie, Yanzhao
Wang, Yangtao
Liu, Yu
Zhou, Ke
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (18) : 25363 - 25381
[6] Cross-modal multi-label image classification modeling and recognition based on nonlinear
Yuan, Shuping
Chen, Yang
Ye, Chengqiong
Bhatt, Mohammed Wasim
Saradeshmukh, Mhalasakant
Hossain, Md Shamim
NONLINEAR ENGINEERING - MODELING AND APPLICATION, 2023, 12 (01):
[7] Multi-Label Cross-modal Retrieval
Ranjan, Viresh
Rasiwasia, Nikhil
Jawahar, C. V.
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4094 - 4102
[8] Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image Classification
Li, Zhuohua
Wang, Ruyun
Zhu, Fuqing
Han, Jizhong
Hu, Songlin
PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 740 - 748
[9] Multi-modal bilinear fusion with hybrid attention mechanism for multi-label skin lesion classification
Wei, Yun
Ji, Lin
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (24) : 65221 - 65247
[10] Fast Graph Convolution Network Based Multi-label Image Recognition via Cross-modal Fusion
Wang, Yangtao
Xie, Yanzhao
Liu, Yu
Zhou, Ke
Li, Xiaocui
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 1575 - 1584

← 1 2 3 4 5 →