Cross-modal fusion for multi-label image classification with attention mechanism

被引：16

作者：

Wang, Yangtao ^{[1
]}

Xie, Yanzhao ^{[2
]}

Zeng, Jiangfeng ^{[3
]}

Wang, Hanpin ^{[1
]}

Fan, Lisheng ^{[1
]}

Song, Yufan ^{[4
]}

机构：

[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China

[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China

[3] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China

[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2022年 / 101卷

基金：

中国国家自然科学基金;

关键词：

Graph convolution network; Attention mechanism; Cross-modal fusion; Multi-label image classification;

D O I：

10.1016/j.compeleceng.2022.108002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

For multi-label image classification, existing studies either utilize a poor multi-step training workflow to explore the (local) relationships between the image target regions and their corresponding labels with attention mechanism or model the (global) label dependencies via graph convolution network (GCN) but fail to efficiently fuse these image features and label word vectors. To address these problems, we develop Cross-modal Fusion for Multi-label Image Classification with attention mechanism (termed as CFMIC), which combines attention mechanism and GCN to capture the local and global label dependencies simultaneously in an end-to-end manner. CFMIC mainly contains three key modules: (1) a feature extraction module with attention mechanism which helps generate the accurate feature of each input image by focusing on the relationships between image labels and image target regions, (2) a label co occurrence embedding learning module with GCN which utilizes GCN to learn the relationships between different objects to generate the label co-occurrence embeddings and (3) a cross-modal fusion module with Multi-modal Factorized Bilinear pooling (termed as MFB) which efficiently fuses the above image features and label co-occurrence embeddings. Extensive experiments on MS-COCO and VOC2007 verify CFMIC greatly promotes the convergence efficiency and produces better classification results than the state-of-the-art approaches.

引用

页数：12

共 50 条

[41] DEEP HASHING MULTI-LABEL IMAGE RETRIEVAL WITH ATTENTION MECHANISM
Xie, Wu
Cui, Mengyin
Liu, Manyi
Wang, Peilei
Qiang, Baohua
INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2022, 37 (04): : 372 - 381
[42] Double Attention Based on Graph Attention Network for Image Multi-Label Classification
Zhou, Wei
Xia, Zhiwu
Dou, Peng
Su, Tao
Hu, Haifeng
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
[43] Visual Attention Consistency under Image Transforms for Multi-Label Image Classification
Guo, Hao
Zheng, Kang
Fan, Xiaochuan
Yu, Hongkai
Wang, Song
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 729 - 739
[44] Real-Time Image Semantic Segmentation Based on Attention Mechanism and Multi-Label Classification
Gao X.
Li C.
An J.
Li, Chungeng (li_chungeng@dlmu.edu.cn), 1600, Institute of Computing Technology (33): : 59 - 67
[45] Multi-view Multi-label Canonical Correlation Analysis for Cross-modal Matching and Retrieval
Sanghavi, Rushil
Verma, Yashaswi
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4700 - 4709
[46] MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval
Duan, Youxiang
Chen, Ning
Zhang, Peiying
Kumar, Neeraj
Chang, Lunjie
Wen, Wu
PATTERN RECOGNITION, 2022, 128
[47] Attention-Augmented Memory Network for Image Multi-Label Classification
Zhou, Wei
Hou, Yanke
Chen, Dihu
Hu, Haifeng
Su, Tao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
[48] Multi-label Image Classification via Coarse-to-Fine Attention*
Lyu, Fan
Li, Linyan
Victor, S. Sheng
Fu, Qiming
Hu, Fuyuan
CHINESE JOURNAL OF ELECTRONICS, 2019, 28 (06) : 1118 - 1126
[49] Multi-label Image Classification via Coarse-to-Fine Attention
LYU Fan
LI Linyan
Victor S.Sheng
FU Qiming
HU Fuyuan
ChineseJournalofElectronics, 2019, 28 (06) : 1118 - 1126
[50] Coarse to Fine: Multi-label Image Classification with Global/Local Attention
Lyu, Fan
Hu, Fuyuan
Sheng, Victor S.
Wu, Zhengtian
Fu, Qiming
Fu, Baochuan
2018 IEEE INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2018,

← 1 2 3 4 5 →