From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval

被引：7

作者：

Dong, Jianfeng ^{[1
]}

Peng, Xiaoman ^{[2
]}

Ma, Zhe ^{[3
]}

Liu, Daizong ^{[4
]}

Qu, Xiaoye ^{[5
]}

Yang, Xun ^{[6
]}

Zhu, Jixiang ^{[2
]}

Liu, Baolong ^{[1
]}

机构：

[1] Zhejiang Gongshang Univ, Zhejiang Key Lab E Commerce, Hangzhou, Peoples R China

[2] Zhejiang Gongshang Univ, Hangzhou, Peoples R China

[3] Zhejiang Univ, Hangzhou, Peoples R China

[4] Peking Univ, Beijing, Peoples R China

[5] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

[6] Univ Sci & Technol China, Hefei, Peoples R China

来源：

PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023 | 2023年

关键词：

Fashion Retrieval; Fine-Grained Similarity; Image Retrieval;

D O I：

10.1145/3539618.3591690

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Attribute-specific fashion retrieval (ASFR) is a challenging information retrieval task, which has attracted increasing attention in recent years. Different from traditional fashion retrieval which mainly focuses on optimizing holistic similarity, the ASFR task concentrates on attribute-specific similarity, resulting in more finegrained and interpretable retrieval results. As the attribute-specific similarity typically corresponds to the specific subtle regions of images, we propose a Region-to-Patch Framework (RPF) that consists of a region-aware branch and a patch-aware branch to extract fine-grained attribute-related visual features for precise retrieval in a coarse-to-fine manner. In particular, the region-aware branch is first to be utilized to locate the potential regions related to the semantic of the given attribute. Then, considering that the located region is coarse and still contains the background visual contents, the patch-aware branch is proposed to capture patch-wise attributerelated details from the previous amplified region. Such a hybrid architecture strikes a proper balance between region localization and feature extraction. Besides, different from previous works that solely focus on discriminating the attribute-relevant foreground visual features, we argue that the attribute-irrelevant background features are also crucial for distinguishing the detailed visual contexts in a contrastive manner. Therefore, a novel E-InfoNCE loss based on the foreground and background representations is further proposed to improve the discrimination of attribute-specific representation. Extensive experiments on three datasets demonstrate the effectiveness of our proposed framework, and also show a decent generalization of our RPF on out-of-domain fashion images. Our source code is available at https://github.com/HuiGuanLab/RPF.

引用

页码：1273 / 1282

页数：10

共 25 条

[1] Attribute-Aware Attention Model for Fine-grained Representation Learning
Han, Kai
Guo, Jianyuan
Zhang, Chao
Zhu, Mingjian
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2040 - 2048
[2] A2-NET: Learning Attribute-Aware Hash Codes for Large-Scale Fine-Grained Image Retrieval
Wei, Xiu-Shen
Shen, Yang
Sun, Xuhao
Ye, Han-Jia
Yang, Jian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Fine-grained attribute-aware analysis for person re-identification
Bai, Kunlong
Fu, Saiji
Yang, Linrui
Liu, Dalian
8TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2020 & 2021): DEVELOPING GLOBAL DIGITAL ECONOMY AFTER COVID-19, 2022, 199 : 276 - 283
[4] Integrating foreground-background feature distillation and contrastive feature learning for ultra-fine-grained visual classification
Chen, Qiupu
Jiao, Lin
Wang, Fenmei
Du, Jianming
Liu, Haiyun
Wang, Xue
Wang, Rujing
PATTERN RECOGNITION, 2024, 150
[5] Foreground-Background Partitioning and Feature Fusion for Weakly Supervised Fine-Grained Image Recognition
Liu, Chenglin
Li, Jiuliang
Chen, Yanmin
Luo, Jun
Zhou, Mengyao
Yang, Jian
Li, Zhenfei
PATTERN RECOGNITION AND COMPUTER VISION, PT III, PRCV 2024, 2025, 15033 : 17 - 30
[6] Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
Ding, Shuangrui
Li, Maomao
Yang, Tianyu
Qian, Rui
Xu, Haohang
Chen, Qingyi
Wang, Jue
Xiong, Hongkai
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9706 - 9716
[7] Attribute-Aware Deep Hashing With Self-Consistency for Large-Scale Fine-Grained Image Retrieval
Wei, Xiu-Shen
Shen, Yang
Sun, Xuhao
Wang, Peng
Peng, Yuxin
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13904 - 13920
[8] Fine-grained Foreground Retrieval via Teacher-Student Learning
Wu, Zongze
Lischinski, Dani
Shechtman, Eli
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3645 - 3653
[9] Fine-Grained Visual Attribute Extraction from Fashion Wear
Parekh, Viral
Shaik, Karimulla
Biswas, Soma
Chelliah, Muthusamy
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3968 - 3972
[10] Learning Structured Relation Embeddings for Fine-Grained Fashion Attribute Recognition
Zhu, Shumin
Zou, Xingxing
Qian, Jianjun
Wong, Wai Keung
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1652 - 1664

← 1 2 3 →