Improving fashion captioning via attribute-based alignment and multi-level language model

被引:1
|
作者
Tang, Yuhao [1 ]
Zhang, Liyan [1 ]
Yuan, Ye [1 ]
Chen, Zhixian [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion; Image captioning; E-commerce;
D O I
10.1007/s10489-023-05167-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion captioning aims to generate detailed and captivating descriptions based on a group of item images. It requires the model to precisely describe attribute details under the supervision of complex sentences. Existing image captioning methods typically focus on describing a single image and often struggle to capture fine-grained visual representations in the fashion domain. Furthermore, the presence of complex description noise and unbalanced word distribution in fashion datasets limits diverse sentence generation. To alleviate redundancy in raw images, we propose an Attribute-based Alignment Module (AAM). The AAM captures more content-related information to enhance visual representations. Based on this design, we demonstrate that fashion captioning can benefit greatly from grid features with detailed alignment, in contrast to previous success with dense features. To address the inherent word distribution imbalance, we introduce a more balanced corpus called Fashion-Style-27k, collected from various shopping websites. Additionally, we present a pre-trained Fashion Language Model (FLM) that integrates sentence-level and attribute-level language knowledge into the caption model. Experiments on the FACAD and Fashion-Gen datasets show the proposed AAM-FLM outperforms existing methods. Descriptions in the two datasets are from considerably different lengths and styles, ranging from the 21-word detailed description to the 30-word template-based sentence, demonstrating the generalization ability of the proposed model.
引用
收藏
页码:30757 / 30777
页数:21
相关论文
共 50 条
  • [31] Improving Data Provenance Reconstruction via a Multi-Level Funneling Approach
    Vasudevan, Subha
    Pfeffer, William
    Davis, Delmar
    Asuncion, Hazeline
    PROCEEDINGS OF THE 2016 IEEE 12TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2016, : 175 - 184
  • [32] Deep Incomplete Multi-view Clustering via Multi-level Imputation and Contrastive Alignment
    Wang, Ziyu
    Du, Yiming
    Wang, Yao
    Ning, Rui
    Li, Lusi
    NEURAL NETWORKS, 2025, 181
  • [33] Improving Privacy and Security in Decentralizing Multi-Authority Attribute-Based Encryption in Cloud Computing
    Yang, Yan
    Chen, Xingyuan
    Chen, Hao
    Du, Xuehui
    IEEE ACCESS, 2018, 6 : 18009 - 18021
  • [34] A Multi-level Access Control Scheme Based on Attribute Encryption for Big Data
    Li, Ruixia
    Peng, Wei
    2019 4TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2019), 2019, : 694 - 698
  • [35] Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
    Nguyen, Hoang H.
    Zhang, Chenwei
    Liu, Ye
    Yu, Philip S.
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 470 - 481
  • [36] Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning
    Xu, Ning
    Zhang, Hanwang
    Liu, An-An
    Nie, Weizhi
    Su, Yuting
    Nie, Jie
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1372 - 1383
  • [37] A Text Classification Model via Multi-Level Semantic Features
    Mao, Keji
    Xu, Jinyu
    Yao, Xingda
    Qiu, Jiefan
    Chi, Kaikai
    Dai, Guanglin
    SYMMETRY-BASEL, 2022, 14 (09):
  • [38] A Grey Multi-Level Evaluation of Industrial Park Ecology Based on a Coefficient of Variation-Attribute Hierarchy Model
    Qiu, Baolin
    Luo, Dongkun
    SUSTAINABILITY, 2021, 13 (04) : 1 - 22
  • [39] CLIP-Based Multi-level Alignment for Text-based Person Search
    Wu, Zhijun
    Ma, Shiwei
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 610 - 614
  • [40] Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
    Zhai, Yajing
    Zeng, Yawen
    Huang, Zhiyong
    Qin, Zheng
    Jin, Xin
    Cao, Da
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6979 - 6987