General Multi-label Image Classification with Transformers

被引:178
|
作者
Lanchantin, Jack [1 ]
Wang, Tianlu [1 ]
Ordonez, Vicente [1 ]
Qi, Yanjun [1 ]
机构
[1] Univ Virginia, Charlottesville, VA 22903 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR46437.2021.01621
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label image classification is the task of predicting a set of labels corresponding to objects, attributes or other entities present in an image. In this work we propose the Classification Transformer (C-Tran), a general framework for multi-label image classification that leverages Transformers to exploit the complex dependencies among visual features and labels. Our approach consists of a Transformer encoder trained to predict a set of target labels given an input set of masked labels, and visual features from a convolutional neural network. A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels as positive, negative, or unknown during training. Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome. Moreover, because our model explicitly represents the label state during training, it is more general by allowing us to produce improved results for images with partial or extra label annotations during inference. We demonstrate this additional capability in the COCO, Visual Genome, News-500, and CUB image datasets.
引用
收藏
页码:16473 / 16483
页数:11
相关论文
共 50 条
  • [1] Visual Transformers with Primal Object Queries for Multi-Label Image Classification
    Yazici, Vacit Oguz
    Van De Weijer, Joost
    Yu, Longlong
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3014 - 3020
  • [2] CROSS-LAYER AGGREGATION WITH TRANSFORMERS FOR MULTI-LABEL IMAGE CLASSIFICATION
    Zhang, Weibo
    Zhu, Fuqing
    Han, Jizhong
    Guo, Tao
    Hu, Songlin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3448 - 3452
  • [3] Multi-Label Retinal Disease Classification Using Transformers
    Rodriguez, Manuel Alejandro
    AlMarzouqi, Hasan
    Liatsis, Panos
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (06) : 2739 - 2750
  • [4] Exploiting Label Dependencies for Multi-Label Document Classification Using Transformers
    Fallah, Haytame
    Bruno, Emmanuel
    Bellot, Patrice
    Murisasco, Elisabeth
    [J]. PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
  • [5] Towards efficient diagnostics: refining vision transformers for medical image multi-label classification
    Cayce, Garrett I.
    Hand, Benjamin M.
    Kurz, Aidan G.
    Bailey, Colleen P.
    [J]. ANOMALY DETECTION AND IMAGING WITH X-RAYS, ADIX IX, 2024, 13043
  • [6] Towards the Interpretation of Multi-label Image Classification using Transformers and Fuzzy Cognitive Maps
    Sovatzidi, Georgia
    Vasilakakis, Michael D.
    Iakovidis, Dimitris K.
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, FUZZ, 2023,
  • [7] Transformers for Multi-label Classification of Medical Text: An Empirical Comparison
    Yogarajan, Vithya
    Montiel, Jacob
    Smith, Tony
    Pfahringer, Bernhard
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE (AIME 2021), 2021, : 114 - 123
  • [8] Aligning Image Semantics and Label Concepts for Image Multi-Label Classification
    Zhou, Wei
    Xia, Zhiwu
    Dou, Peng
    Su, Tao
    Hu, Haifeng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (02)
  • [9] Exploring Transformers for Multi-Label Classification of Java']Java Vulnerabilities
    Mamede, Claudia
    Pinconschi, Eduard
    Abreu, Rui
    Campos, Jose
    [J]. 2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2022, : 43 - 52
  • [10] Taming Pretrained Transformers for Extreme Multi-label Text Classification
    Chang, Wei-Cheng
    Yu, Hsiang-Fu
    Zhong, Kai
    Yang, Yiming
    Dhillon, Inderjit S.
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3163 - 3171