Improving fashion captioning via attribute-based alignment and multi-level language model

被引:1
|
作者
Tang, Yuhao [1 ]
Zhang, Liyan [1 ]
Yuan, Ye [1 ]
Chen, Zhixian [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China
基金
中国国家自然科学基金;
关键词
Fashion; Image captioning; E-commerce;
D O I
10.1007/s10489-023-05167-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fashion captioning aims to generate detailed and captivating descriptions based on a group of item images. It requires the model to precisely describe attribute details under the supervision of complex sentences. Existing image captioning methods typically focus on describing a single image and often struggle to capture fine-grained visual representations in the fashion domain. Furthermore, the presence of complex description noise and unbalanced word distribution in fashion datasets limits diverse sentence generation. To alleviate redundancy in raw images, we propose an Attribute-based Alignment Module (AAM). The AAM captures more content-related information to enhance visual representations. Based on this design, we demonstrate that fashion captioning can benefit greatly from grid features with detailed alignment, in contrast to previous success with dense features. To address the inherent word distribution imbalance, we introduce a more balanced corpus called Fashion-Style-27k, collected from various shopping websites. Additionally, we present a pre-trained Fashion Language Model (FLM) that integrates sentence-level and attribute-level language knowledge into the caption model. Experiments on the FACAD and Fashion-Gen datasets show the proposed AAM-FLM outperforms existing methods. Descriptions in the two datasets are from considerably different lengths and styles, ranging from the 21-word detailed description to the 30-word template-based sentence, demonstrating the generalization ability of the proposed model.
引用
收藏
页码:30757 / 30777
页数:21
相关论文
共 50 条
  • [1] Improving fashion captioning via attribute-based alignment and multi-level language model
    Yuhao Tang
    Liyan Zhang
    Ye Yuan
    Zhixian Chen
    Applied Intelligence, 2023, 53 : 30803 - 30821
  • [2] Blockchain-Based Traceable Multi-Level Revocation Attribute-Based Encryption
    Xu, Zhigang
    Pan, Tiantian
    Han, Hongmu
    Dong, Xinhua
    Wang, Zhongpeng
    He, Miaomiao
    IEEE ACCESS, 2024, 12 : 173758 - 173774
  • [3] Secure Reinsurance Data Sharing Scheme Based on Blockchain and Multi-level Attribute-Based Encryption
    Yue, Xiaolin
    Ma, Ziqiang
    Zhang, Juanyang
    Lan, Yajie
    Chen, Jiali
    WEB INFORMATION SYSTEMS AND APPLICATIONS, WISA 2024, 2024, 14883 : 428 - 435
  • [4] Efficient Multi-Attribute Similarity Learning Towards Attribute-based Fashion Search
    Ak, Kenan E.
    Lim, Joo Hwee
    Tham, Jo Yew
    Kassim, Ashraf A.
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1671 - 1679
  • [5] MLS-ABAC: Efficient Multi-Level Security Attribute-Based Access Control scheme
    Aghili, Seyed Farhad
    Sedaghat, Mahdi
    Singelee, Dave
    Gupta, Maanak
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 131 : 75 - 90
  • [6] MULTI-LEVEL BASED PEDESTRIAN ATTRIBUTE RECOGNITION
    Yan, Hua-Rui
    Zhan, Jin-Yu
    Li, Fan
    Zhang, Ting
    Li, Na
    Li, Zu-Ning
    2019 16TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICWAMTIP), 2019, : 166 - 169
  • [7] MAM-RNN: Multi-level Attention Model Based RNN for Video Captioning
    Li, Xuelong
    Zhao, Bin
    Lu, Xiaoqiang
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2208 - 2214
  • [8] Multi-level video captioning method based on semantic space
    Yao, Xiao
    Zeng, Yuanlin
    Gu, Min
    Yuan, Ruxi
    Li, Jie
    Ge, Junyi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 72113 - 72130
  • [9] Improving Privacy and Security in Multi-Authority Attribute-Based Encryption
    Chase, Melissa
    Chow, Sherman S. M.
    CCS'09: PROCEEDINGS OF THE 16TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2009, : 121 - 130
  • [10] Generalizable Sleep Staging via Multi-Level Domain Alignment
    Wang, Jiquan
    Zhao, Sha
    Jiang, Haiteng
    Li, Shijian
    Li, Tao
    Pan, Gang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 1, 2024, : 265 - 273