Improving fashion captioning via attribute-based alignment and multi-level language model

被引：1

作者：

Tang, Yuhao ^{[1
]}

Zhang, Liyan ^{[1
]}

Yuan, Ye ^{[1
]}

Chen, Zhixian ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 24期

基金：

中国国家自然科学基金;

关键词：

Fashion; Image captioning; E-commerce;

D O I：

10.1007/s10489-023-05167-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fashion captioning aims to generate detailed and captivating descriptions based on a group of item images. It requires the model to precisely describe attribute details under the supervision of complex sentences. Existing image captioning methods typically focus on describing a single image and often struggle to capture fine-grained visual representations in the fashion domain. Furthermore, the presence of complex description noise and unbalanced word distribution in fashion datasets limits diverse sentence generation. To alleviate redundancy in raw images, we propose an Attribute-based Alignment Module (AAM). The AAM captures more content-related information to enhance visual representations. Based on this design, we demonstrate that fashion captioning can benefit greatly from grid features with detailed alignment, in contrast to previous success with dense features. To address the inherent word distribution imbalance, we introduce a more balanced corpus called Fashion-Style-27k, collected from various shopping websites. Additionally, we present a pre-trained Fashion Language Model (FLM) that integrates sentence-level and attribute-level language knowledge into the caption model. Experiments on the FACAD and Fashion-Gen datasets show the proposed AAM-FLM outperforms existing methods. Descriptions in the two datasets are from considerably different lengths and styles, ranging from the 21-word detailed description to the 30-word template-based sentence, demonstrating the generalization ability of the proposed model.

引用

页码：30757 / 30777

页数：21

共 50 条

[31] Improving Data Provenance Reconstruction via a Multi-Level Funneling Approach
Vasudevan, Subha
Pfeffer, William
Davis, Delmar
Asuncion, Hazeline
PROCEEDINGS OF THE 2016 IEEE 12TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2016, : 175 - 184
[32] Deep Incomplete Multi-view Clustering via Multi-level Imputation and Contrastive Alignment
Wang, Ziyu
Du, Yiming
Wang, Yao
Ning, Rui
Li, Lusi
NEURAL NETWORKS, 2025, 181
[33] Improving Privacy and Security in Decentralizing Multi-Authority Attribute-Based Encryption in Cloud Computing
Yang, Yan
Chen, Xingyuan
Chen, Hao
Du, Xuehui
IEEE ACCESS, 2018, 6 : 18009 - 18021
[34] A Multi-level Access Control Scheme Based on Attribute Encryption for Big Data
Li, Ruixia
Peng, Wei
2019 4TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2019), 2019, : 694 - 698
[35] Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
Nguyen, Hoang H.
Zhang, Chenwei
Liu, Ye
Yu, Philip S.
24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 470 - 481
[36] Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning
Xu, Ning
Zhang, Hanwang
Liu, An-An
Nie, Weizhi
Su, Yuting
Nie, Jie
Zhang, Yongdong
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (05) : 1372 - 1383
[37] A Text Classification Model via Multi-Level Semantic Features
Mao, Keji
Xu, Jinyu
Yao, Xingda
Qiu, Jiefan
Chi, Kaikai
Dai, Guanglin
SYMMETRY-BASEL, 2022, 14 (09):
[38] A Grey Multi-Level Evaluation of Industrial Park Ecology Based on a Coefficient of Variation-Attribute Hierarchy Model
Qiu, Baolin
Luo, Dongkun
SUSTAINABILITY, 2021, 13 (04) : 1 - 22
[39] CLIP-Based Multi-level Alignment for Text-based Person Search
Wu, Zhijun
Ma, Shiwei
2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 610 - 614
[40] Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
Zhai, Yajing
Zeng, Yawen
Huang, Zhiyong
Qin, Zheng
Jin, Xin
Cao, Da
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6979 - 6987

← 1 2 3 4 5 →