Cross-modal fusion for multi-label image classification with attention mechanism

被引:16
|
作者
Wang, Yangtao [1 ]
Xie, Yanzhao [2 ]
Zeng, Jiangfeng [3 ]
Wang, Hanpin [1 ]
Fan, Lisheng [1 ]
Song, Yufan [4 ]
机构
[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou, Peoples R China
[2] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan, Peoples R China
[3] Cent China Normal Univ, Sch Informat Management, Wuhan, Peoples R China
[4] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Graph convolution network; Attention mechanism; Cross-modal fusion; Multi-label image classification;
D O I
10.1016/j.compeleceng.2022.108002
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
For multi-label image classification, existing studies either utilize a poor multi-step training workflow to explore the (local) relationships between the image target regions and their corresponding labels with attention mechanism or model the (global) label dependencies via graph convolution network (GCN) but fail to efficiently fuse these image features and label word vectors. To address these problems, we develop Cross-modal Fusion for Multi-label Image Classification with attention mechanism (termed as CFMIC), which combines attention mechanism and GCN to capture the local and global label dependencies simultaneously in an end-to-end manner. CFMIC mainly contains three key modules: (1) a feature extraction module with attention mechanism which helps generate the accurate feature of each input image by focusing on the relationships between image labels and image target regions, (2) a label co occurrence embedding learning module with GCN which utilizes GCN to learn the relationships between different objects to generate the label co-occurrence embeddings and (3) a cross-modal fusion module with Multi-modal Factorized Bilinear pooling (termed as MFB) which efficiently fuses the above image features and label co-occurrence embeddings. Extensive experiments on MS-COCO and VOC2007 verify CFMIC greatly promotes the convergence efficiency and produces better classification results than the state-of-the-art approaches.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] DEEP HASHING MULTI-LABEL IMAGE RETRIEVAL WITH ATTENTION MECHANISM
    Xie, Wu
    Cui, Mengyin
    Liu, Manyi
    Wang, Peilei
    Qiang, Baohua
    INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2022, 37 (04): : 372 - 381
  • [42] Double Attention Based on Graph Attention Network for Image Multi-Label Classification
    Zhou, Wei
    Xia, Zhiwu
    Dou, Peng
    Su, Tao
    Hu, Haifeng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [43] Visual Attention Consistency under Image Transforms for Multi-Label Image Classification
    Guo, Hao
    Zheng, Kang
    Fan, Xiaochuan
    Yu, Hongkai
    Wang, Song
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 729 - 739
  • [44] Real-Time Image Semantic Segmentation Based on Attention Mechanism and Multi-Label Classification
    Gao X.
    Li C.
    An J.
    Li, Chungeng (li_chungeng@dlmu.edu.cn), 1600, Institute of Computing Technology (33): : 59 - 67
  • [45] Multi-view Multi-label Canonical Correlation Analysis for Cross-modal Matching and Retrieval
    Sanghavi, Rushil
    Verma, Yashaswi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4700 - 4709
  • [46] MS2GAH: Multi-label semantic supervised graph attention hashing for robust cross-modal retrieval
    Duan, Youxiang
    Chen, Ning
    Zhang, Peiying
    Kumar, Neeraj
    Chang, Lunjie
    Wen, Wu
    PATTERN RECOGNITION, 2022, 128
  • [47] Attention-Augmented Memory Network for Image Multi-Label Classification
    Zhou, Wei
    Hou, Yanke
    Chen, Dihu
    Hu, Haifeng
    Su, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (03)
  • [48] Multi-label Image Classification via Coarse-to-Fine Attention*
    Lyu, Fan
    Li, Linyan
    Victor, S. Sheng
    Fu, Qiming
    Hu, Fuyuan
    CHINESE JOURNAL OF ELECTRONICS, 2019, 28 (06) : 1118 - 1126
  • [49] Multi-label Image Classification via Coarse-to-Fine Attention
    LYU Fan
    LI Linyan
    Victor S.Sheng
    FU Qiming
    HU Fuyuan
    ChineseJournalofElectronics, 2019, 28 (06) : 1118 - 1126
  • [50] Coarse to Fine: Multi-label Image Classification with Global/Local Attention
    Lyu, Fan
    Hu, Fuyuan
    Sheng, Victor S.
    Wu, Zhengtian
    Fu, Qiming
    Fu, Baochuan
    2018 IEEE INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2018,