MFF: Multi-modal feature fusion for zero-shot learning

被引:8
|
作者
Cao, Weipeng [1 ,2 ]
Wu, Yuhao [2 ]
Huang, Chengchao [3 ]
Patwary, Muhammed J. A. [4 ]
Wang, Xizhao [2 ]
机构
[1] Civil Aviat Univ China, CAAC Key Lab Civil Aviat Wide Surveillance & Safet, Tianjin 300300, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China
[3] Chinese Acad Sci, Nanjing Inst Software Technol, Nanjing 210000, Jiangsu, Peoples R China
[4] Int Islamic Univ Chittagong, Dept Comp Sci & Engn, Chattogram 4318, Bangladesh
基金
中国国家自然科学基金;
关键词
Zero -shot learning; Generative method; Variational auto -encoder; Generative adversarial network; Feature fusion;
D O I
10.1016/j.neucom.2022.09.070
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generative Zero-Shot Learning (ZSL) methods generally generate pseudo-samples/features based on the semantic description information of unseen classes, thereby transforming ZSL tasks into traditional supervised learning tasks. Under this learning paradigm, the quality of pseudo-samples/features guided by the classes' semantic description information is the key to the success of the model. However, the semantic description information used in the existing generative methods is mainly the lowdimensional representation (e.g., attributes) of classes, which leads to the low quality of the generated pseudo-samples/features and may aggravate the problem of domain shift. To alleviate this problem, we introduce the visual principal component feature, which is extracted by a principal component analysis network, to make up for the deficiency of using only semantic description information and propose a novel Variational Auto-Encoder (VAE) and Generative Adversarial Network (GAN) based generative method for ZSL, which we call Multi-modal Feature Fusion algorithm (MFF). In MFF, the input of different modal information enables VAE better fit the original data distribution and the proposed alignment loss ensures the consistency of the generated visual features and the corresponding semantic features. With the help of high-quality pseudo-samples/features, the ZSL model can make more accurate predictions for unseen classes. Extensive experiments on five public datasets demonstrate that our proposed algorithm outperforms several state-of-the-art methods under both ZSL and generalized ZSL settings.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:172 / 180
页数:9
相关论文
共 50 条
  • [1] Multi-modal generative adversarial network for zero-shot learning
    Ji, Zhong
    Chen, Kexin
    Wang, Junyue
    Yu, Yunlong
    Zhang, Zhongfei
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 197
  • [2] A Deep Multi-Modal Explanation Model for Zero-Shot Learning
    Liu, Yu
    Tuytelaars, Tinne
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4788 - 4803
  • [3] Generalised Zero-shot Learning with Multi-modal Embedding Spaces
    Felix, Rafael
    Sasdelli, Michele
    Harwood, Ben
    Carneiro, Gustavo
    [J]. 2020 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2020,
  • [4] A Zero-shot Learning Method with a Multi-modal Knowledge Graph
    Zhang, Yuhong
    Shu, Haitao
    Bu, Chenyang
    Hu, Xuegang
    [J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 391 - 395
  • [5] Multi-modal Cycle-Consistent Generalized Zero-Shot Learning
    Felix, Rafael
    Kumar, B. G. Vijay
    Reid, Ian
    Carneiro, Gustavo
    [J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 21 - 37
  • [6] MULTIINSTRUCT: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
    Xu, Zhiyang
    Shen, Ying
    Huang, Lifu
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11445 - 11465
  • [7] Zero-shot Event Detection using Multi-modal Fusion of Weakly Supervised Concepts
    Wu, Shuang
    Bondugula, Sravanthi
    Luisier, Florian
    Zhuang, Xiaodan
    Natarajan, Pradeep
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2665 - 2672
  • [8] Improving Zero-shot Generalization and Robustness of Multi-modal Models
    Ge, Yunhao
    Ren, Jie
    Gallagher, Andrew
    Wang, Yuxiao
    Yang, Ming-Hsuan
    Adam, Hartwig
    Itti, Laurent
    Lakshminarayanan, Balaji
    Zhao, Raping
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11093 - 11101
  • [9] Multi-modal zero-shot dynamic hand gesture recognition
    Rastgoo, Razieh
    Kiani, Kourosh
    Escalera, Sergio
    Sabokrou, Mohammad
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [10] Multi-Modal Multi-Grained Embedding Learning for Generalized Zero-Shot Video Classification
    Hong, Mingyao
    Zhang, Xinfeng
    Li, Guorong
    Huang, Qingming
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5959 - 5972