Few-shot Incremental Learning with Textual-knowledge Embedding by Visual-language Model

被引:0
|
作者
Yao H.-T. [1 ]
Yu L. [3 ]
Xu C.-S. [1 ,2 ]
机构
[1] State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences), Beijing
[2] School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing
[3] School of Computer Science and Engineering, Tianjin University of Technology, Tianjin
来源
Ruan Jian Xue Bao/Journal of Software | 2024年 / 35卷 / 05期
关键词
class-space guided anti-forgetting learning; few-shot incremental learning (FSIL); textual-knowledge embedding; visual-language model;
D O I
10.13328/j.cnki.jos.007022
中图分类号
学科分类号
摘要
In real scenarios, the application often faces the problems of data scarcity and dynamic data changes. Few-shot incremental learning aims to use a small amount of data to infer data knowledge and reduce the model’s catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier, so as to achieve the model’s transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a class, resulting in weak generalization ability of the above algorithms. Compared with visual features, the text features of image class descriptions have better generalization and anti-forgetting abilities. Therefore, based on the visual language model (VLM), this study investigates the few-shot incremental learning based on textual knowledge embedding and realizes the effective learning of new and old class data in few-shot incremental learning by embedding text features with anti-forgetting ability in visual features. Specifically, in the basic learning stage, the study uses the VLM to extract the pre-trained visual features and class text descriptions of the image. Furthermore, the study uses the text encoder to project the pre-trained visual features to text space. Next, the study uses the visual encoder to fuse the learned text features and pre-trained visual features to abstract visual features with high discrimination ability. In the incremental learning stage, the study proposes the class space-guided anti-forgetting learning and uses the class space encoding of old data and new data features to fine-tune the visual encoder and text encoder, so as to achieve new data knowledge learning while reviewing old knowledge. This study also verifies the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Car-196, and miniImageNet), proving that textual knowledge embedding based on VLM can further improve the robustness of few-shot incremental learning on the basis of visual features. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2101 / 2119
页数:18
相关论文
共 61 条
  • [1] He KM, Zhang XY, Ren QS, Sun J., Deep residual learning for image recognition, Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
  • [2] Krizhevsky A, Sutskever I, Hinton GE., ImageNet classification with deep convolutional neural networks, Proc. of the 25th Int’l Conf. on Neural Information Processing Systems, pp. 1097-1105, (2012)
  • [3] Liu Y, Lei YB, Fan JL, Wang FP, Gong YC, Tian Q., Survey on image classification technology based on small sample learning, Acta Automatica Sinica, 47, 2, pp. 297-315, (2021)
  • [4] Du YD, Feng L, Tao P, Gong X, Wang J., Research on meta-transfer learning in cross-domain image classification with few-shot, Journal of Image and Graphics, 28, 9, pp. 2899-2912, (2023)
  • [5] Finn C, Abbeel P, Levine S., Model-agnostic meta-learning for fast adaptation of deep networks, Proc. of the 34th Int’l Conf. on Machine Learning, pp. 1126-1135, (2017)
  • [6] Jamal MA, Qi GJ., Task agnostic meta-learning for few-shot learning, Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 11711-11719, (2019)
  • [7] Ge YZ, Liu H, Wang Y, Xu BL, Zhou Q, Shen FR., Survey on deep learning image recognition in dilemma of small samples, Ruan Jian Xue Bao/Journal of Software, 33, 1, pp. 193-210, (2022)
  • [8] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R., Overcoming catastrophic forgetting in neural networks, Proc. of the National Academy of Sciences of the United States of America, 114, 13, pp. 3521-3526, (2017)
  • [9] Lee SW, Kim JH, Jun J, Ha JW, Zhang BT., Overcoming catastrophic forgetting by incremental moment matching, Proc. of the 31st Int’l Conf. on Neural Information Processing Systems, pp. 4655-4665, (2017)
  • [10] Aljundi R, Babiloni F, Elhoseiny M, Rohrbach M, Tuytelaars T., Memory aware synapses: Learning what (not) to forget, Proc. of the 15th European Conf. on Computer Vision, pp. 144-161, (2018)