Image-Captioning Model Compression

被引:2
|
作者
Atliha, Viktar [1 ]
Sesok, Dmitrij [1 ]
机构
[1] Vilnius Gediminas Tech Univ, Dept Informat Technol, Sauletekio Al 11, LT-10223 Vilnius, Lithuania
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 03期
关键词
image captioning; model compression; pruning; quantization; NETWORK;
D O I
10.3390/app12031638
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Image captioning is a very important task, which is on the edge between natural language processing (NLP) and computer vision (CV). The current quality of the captioning models allows them to be used for practical tasks, but they require both large computational power and considerable storage space. Despite the practical importance of the image-captioning problem, only a few papers have investigated model size compression in order to prepare them for use on mobile devices. Furthermore, these works usually only investigate decoder compression in a typical encoder-decoder architecture, while the encoder traditionally occupies most of the space. We applied the most efficient model-compression techniques such as architectural changes, pruning and quantization to several state-of-the-art image-captioning architectures. As a result, all of these models were compressed by no less than 91% in terms of memory (including encoder), but lost no more than 2% and 4.5% in metrics such as CIDEr and SPICE, respectively. At the same time, the best model showed results of 127.4 CIDEr and 21.4 SPICE, with a size equal to only 34.8 MB, which sets a strong baseline for compression problems for image-captioning models, and could be used for practical applications.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] An Attentive Fourier-Augmented Image-Captioning Transformer
    Osolo, Raymond Ian
    Yang, Zhan
    Long, Jun
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [2] Predicting a Song Title from Audio Embeddings on a Pretrained Image-captioning Network
    Bleiweiss, Avi
    [J]. ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 483 - 493
  • [3] Hybrid of Deep Learning and Word Embedding in Generating Captions: Image-Captioning Solution for Geological Rock Images
    Nursikuwagus, Agus
    Munir, Rinaldi
    Khodra, Masayu Leylia
    [J]. JOURNAL OF IMAGING, 2022, 8 (11)
  • [4] A visual persistence model for image captioning
    Wang, Yiyu
    Xu, Jungang
    Sun, Yingfei
    [J]. NEUROCOMPUTING, 2022, 468 : 48 - 59
  • [5] Feedback Attention Model for Image Captioning
    Lyu, Fan
    Hu, Fuyuan
    Zhang, Yanning
    Xia, Zhenping
    Sheng, Victor S
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2019, 31 (07): : 1122 - 1129
  • [6] Image Captioning with Masked Diffusion Model
    Tian, Weidong
    Xu, Wenzheng
    Zhao, Junxiang
    Zhao, Zhongqiu
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VIII, ICIC 2024, 2024, 14869 : 216 - 227
  • [7] GLCM: Global-Local Captioning Model for Remote Sensing Image Captioning
    Wang, Qi
    Huang, Wei
    Zhang, Xueting
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6910 - 6922
  • [8] Image captioning in Turkish language: Database and model
    Yildiz, Tugba
    Sonmez, Elena Battini
    Yilmaz, Berk Dursun
    Demir, Ali Emre
    [J]. JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2020, 35 (04): : 2089 - 2100
  • [9] An Augmented Image Captioning Model: Incorporating Hierarchical Image Information
    Funckes, Nathan
    Carrier, Erin
    Wolffe, Greg
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1608 - 1614
  • [10] Hyperparameter Tuning over an Attention Model for Image Captioning
    Castro, Roberto
    Pineda, Israel
    Eugenio Morocho-Cayamcela, Manuel
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGIES (TICEC 2021), 2021, 1456 : 172 - 183