Improving fashion captioning via attribute-based alignment and multi-level language model

被引:1
|
作者
Tang, Yuhao [1 ]
Zhang, Liyan [1 ]
Yuan, Ye [1 ]
Chen, Zhixian [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion; Image captioning; E-commerce;
D O I
10.1007/s10489-023-05167-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion captioning aims to generate detailed and captivating descriptions based on a group of item images. It requires the model to precisely describe attribute details under the supervision of complex sentences. Existing image captioning methods typically focus on describing a single image and often struggle to capture fine-grained visual representations in the fashion domain. Furthermore, the presence of complex description noise and unbalanced word distribution in fashion datasets limits diverse sentence generation. To alleviate redundancy in raw images, we propose an Attribute-based Alignment Module (AAM). The AAM captures more content-related information to enhance visual representations. Based on this design, we demonstrate that fashion captioning can benefit greatly from grid features with detailed alignment, in contrast to previous success with dense features. To address the inherent word distribution imbalance, we introduce a more balanced corpus called Fashion-Style-27k, collected from various shopping websites. Additionally, we present a pre-trained Fashion Language Model (FLM) that integrates sentence-level and attribute-level language knowledge into the caption model. Experiments on the FACAD and Fashion-Gen datasets show the proposed AAM-FLM outperforms existing methods. Descriptions in the two datasets are from considerably different lengths and styles, ranging from the 21-word detailed description to the 30-word template-based sentence, demonstrating the generalization ability of the proposed model.
引用
收藏
页码:30757 / 30777
页数:21
相关论文
共 50 条
  • [21] MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
    Li, Zejun
    Fan, Zhihao
    Tou, Huaixiao
    Chen, Jingjing
    Wei, Zhongyu
    Huang, Xuanjing
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4395 - 4405
  • [22] Community-aware graph embedding via multi-level attribute integration
    Yafang Li
    Wenbo Wang
    Jianwen Wei
    Baokai Zu
    Knowledge and Information Systems, 2023, 65 : 5635 - 5655
  • [23] Community-aware graph embedding via multi-level attribute integration
    Li, Yafang
    Wang, Wenbo
    Wei, Jianwen
    Zu, Baokai
    KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (12) : 5635 - 5655
  • [24] Attribute-based Encrypted Search for Multi-owner and Multi-user Model
    Wang, Mingyue
    Miao, Yinbin
    Guo, Yu
    Wang, Cong
    Huang, Hejiao
    Jia, Xiaohua
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [25] Improving File Hierarchy Attribute-Based Encryption Scheme with Multi-authority in Cloud
    Kang, Li
    Zhang, Leyou
    FRONTIERS IN CYBER SECURITY, FCS 2019, 2019, 1105 : 3 - 18
  • [26] Image Captioning Model Based on Multi Level Visual Fusion
    Zhou D.-M.
    Zhang C.-L.
    Li Z.-X.
    Wang Z.-W.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (07): : 1286 - 1290
  • [27] Multi-level Video Captioning based on Label Classification using Machine Learning Techniques
    Vaishnavi, J.
    Narmatha, V.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 582 - 588
  • [28] Multi-Level Fusion Model for Person Re-Identification by Attribute Awareness
    Pei, Shengyu
    Fan, Xiaoping
    ALGORITHMS, 2022, 15 (04)
  • [29] Multi-level social network alignment via adversarial learning and graphlet modeling
    Duan, Jingyuan
    Kang, Zhao
    Tian, Ling
    Xin, Yichen
    NEURAL NETWORKS, 2025, 185
  • [30] Evidencing learning outcomes: a multi-level, multi-dimensional course alignment model
    Sridharan, Bhavani
    Leitch, Shona
    Watty, Kim
    QUALITY IN HIGHER EDUCATION, 2015, 21 (02) : 171 - 188