Improving fashion captioning via attribute-based alignment and multi-level language model

被引：1

作者：

Tang, Yuhao ^{[1
]}

Zhang, Liyan ^{[1
]}

Yuan, Ye ^{[1
]}

Chen, Zhixian ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 211106, Peoples R China

来源：

APPLIED INTELLIGENCE | 2023年 / 53卷 / 24期

基金：

中国国家自然科学基金;

关键词：

Fashion; Image captioning; E-commerce;

D O I：

10.1007/s10489-023-05167-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fashion captioning aims to generate detailed and captivating descriptions based on a group of item images. It requires the model to precisely describe attribute details under the supervision of complex sentences. Existing image captioning methods typically focus on describing a single image and often struggle to capture fine-grained visual representations in the fashion domain. Furthermore, the presence of complex description noise and unbalanced word distribution in fashion datasets limits diverse sentence generation. To alleviate redundancy in raw images, we propose an Attribute-based Alignment Module (AAM). The AAM captures more content-related information to enhance visual representations. Based on this design, we demonstrate that fashion captioning can benefit greatly from grid features with detailed alignment, in contrast to previous success with dense features. To address the inherent word distribution imbalance, we introduce a more balanced corpus called Fashion-Style-27k, collected from various shopping websites. Additionally, we present a pre-trained Fashion Language Model (FLM) that integrates sentence-level and attribute-level language knowledge into the caption model. Experiments on the FACAD and Fashion-Gen datasets show the proposed AAM-FLM outperforms existing methods. Descriptions in the two datasets are from considerably different lengths and styles, ranging from the 21-word detailed description to the 30-word template-based sentence, demonstrating the generalization ability of the proposed model.

引用

页码：30757 / 30777

页数：21

共 50 条

[11] Next Basket Recommendation Model Based on Attribute-Aware Multi-Level Attention
Liu, Tong
Yin, Xianrui
Ni, Weijian
IEEE ACCESS, 2020, 8 : 153872 - 153880
[12] A Multi-level Alignment Training Scheme for Video-and-Language Grounding
Zhang, Yubo
Niu, Feiyang
Ping, Qing
Thattai, Govind
2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 958 - 966
[13] Middle-Level Attribute-Based Language Retouching for Image Caption Generation
Guan, Zhibin
Liu, Kang
Ma, Yan
Qian, Xu
Ji, Tongkai
APPLIED SCIENCES-BASEL, 2018, 8 (10):
[14] Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval
Xiao, Ling
Yamasaki, Toshihiko
IEEE ACCESS, 2024, 12 (48068-48080) : 48068 - 48080
[15] Attribute Based Signatures for Bounded Multi-level Threshold Circuits
Kumar, Swarun
Agrawal, Shivank
Balaraman, Subha
Rangan, C. Pandu
PUBLIC KEY INFRASTRUCTURES, SERVICES AND APPLICATIONS, 2011, 6711 : 141 - 154
[16] Mulan: A Multi-Level Alignment Model for Video Question Answering
Fu, Yu
Cao, Cong
Yang, Yuling
Lu, Yuhai
Yuan, Fangfang
Wang, Dakui
Liu, Yanbing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5475 - 5489
[17] Model based multi-level prototyping
Bredenfeld, A
Wilberg, J
TENTH IEEE INTERNATIONAL WORKSHOP ON RAPID SYSTEMS PROTOTYPING, PROCEEDINGS, 1999, : 190 - 195
[18] A multi-level/multi-type model for design-based alignment of instruction, assessment, and testing
Hickey, DT
Zuiker, SJ
McGee, S
ICLS2004: INTERNATIONAL CONFERENCE OF THE LEARNING SCIENCES, PROCEEDINGS: EMBRACING DIVERSITY IN THE LEARNING SCIENCES, 2004, : 607 - 607
[19] MRCap: Multi-modal and Multi-level Relationship-based Dense Video Captioning
Chen, Wei
Niu, Jianwei
Liu, Xuefeng
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2615 - 2620
[20] Single-Stream Multi-level Alignment for Vision-Language Pretraining
Khan, Zaid
Kumar, B. G. Vijay
Yu, Xiang
Schulter, Samuel
Chandraker, Manmohan
Fu, Yun
COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 735 - 751

← 1 2 3 4 5 →