Clothes image caption generation with attribute detection and visual attention model

被引:14
|
作者
Li, Xianrui [1 ]
Ye, Zhiling [1 ]
Zhang, Zhao [2 ,3 ]
Zhao, Mingbo [1 ]
机构
[1] Donghua Univ, Shanghai, Peoples R China
[2] Soochow Univ, Suzhou, Jiangsu, Peoples R China
[3] Hefei Univ Technol, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Image caption generation; Visual attention mechanism; LSTM; Fashion AI; CNN; Transfer learning;
D O I
10.1016/j.patrec.2020.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion is a multi-billion-dollar industry, which is directly related to social, cultural, and economic implications in the real world. While computer vision has demonstrated remarkable success in the applications of the fashion domain, natural language processing technology has become contributed in the area, so that it can build the connection between clothes image and human semantic understandings. An element work for combing images and language understanding is how to generate a natural language sentence that accurately summarizes the contents of a clothes image. In this paper, we develop a joint attribute detection and visual attention framework for clothes image captioning. Specifically, in order to involve more attributes of clothes to learn, we first utilize a pre-trained Convolutional Neural Network (CNN) to learn the feature that can characterize more information about clothing attribute. Based on such learned feature, we then adopt an encoder/decoder framework, where we first encoder the feature of clothes and then and input it to a language Long Short-Term Memory(LSTM) model for decoding the clothes descriptions. The method greatly enhances the performance of clothes image captioning and reduces the misleading attention. Extensive simulations based on real-world data verify the effectiveness of the proposed method. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:68 / 74
页数:7
相关论文
共 50 条
  • [1] Image Caption Generation Using Attention Model
    Ramalakshmi, Eliganti
    Jain, Moksh Sailesh
    Uddin, Mohammed Ameer
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 1009 - 1017
  • [2] Cross-Lingual Image Caption Generation Based on Visual Attention Model
    Wang, Bin
    Wang, Cungang
    Zhang, Qian
    Su, Ying
    Wang, Yang
    Xu, Yanyan
    [J]. IEEE ACCESS, 2020, 8 : 104543 - 104554
  • [3] Image Caption Generation with Hierarchical Contextual Visual Spatial Attention
    Khademi, Mahmoud
    Schulte, Oliver
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2024 - 2032
  • [4] Chinese Image Caption Generation via Visual Attention and Topic Modeling
    Liu, Maofu
    Hu, Huijun
    Li, Lingjun
    Yu, Yan
    Guan, Weili
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1247 - 1257
  • [5] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
    Xu, Kelvin
    Ba, Jimmy Lei
    Kiros, Ryan
    Cho, Kyunghyun
    Courville, Aaron
    Salakhutdinov, Ruslan
    Zemel, Richard S.
    Bengio, Yoshua
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2048 - 2057
  • [6] Visual Attention Based on Long-Short Term Memory Model for Image Caption Generation
    Qu, Shiru
    Xi, Yuling
    Ding, Songtao
    [J]. 2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4789 - 4794
  • [7] GVA: guided visual attention approach for automatic image caption generation
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Hossain, Md. Imran
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (01)
  • [8] GVA: guided visual attention approach for automatic image caption generation
    Md. Bipul Hossen
    Zhongfu Ye
    Amr Abdussalam
    Md. Imran Hossain
    [J]. Multimedia Systems, 2024, 30
  • [9] Recurrent Attention LSTM Model for Image Chinese Caption Generation
    Zhang, Chaoying
    Dai, Yaping
    Cheng, Yanyan
    Jia, Zhiyang
    Hirota, Kaoru
    [J]. 2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 808 - 813
  • [10] Image caption based on Visual Attention Mechanism
    Zhou, Jinfei
    Zhu, Yaping
    Pan, Hong
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO AND SIGNAL PROCESSING (IVSP 2019), 2019, : 28 - 32