CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

被引：4

作者：

Shen, Shuai ^{[1
,2
]}

Li, Wanhua ^{[1
,2
]}

Wang, Xiaobing ^{[3
]}

Zhang, Dafeng ^{[3
]}

Jin, Zhezhu ^{[3
]}

Zhou, Jie ^{[1
,2
]}

Lu, Jiwen ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

[2] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China

[3] Samsung Res China Beijing, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.01900

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the most important yet rarely studied challenges for supervised face clustering is the large intra-class variance caused by different face attributes such as age, pose, and expression. Images of the same identity but with different face attributes usually tend to be clustered into different sub-clusters. For the first time, we proposed an attribute hallucination framework named CLIP-Cluster to address this issue, which first hallucinates multiple representations for different attributes with the powerful CLIP model and then pools them by learning neighbor-adaptive attention. Specifically, CLIP-Cluster first introduces a text-driven attribute hallucination module, which allows one to use natural language as the interface to hallucinate novel attributes for a given face image based on the well-aligned image-language CLIP space. Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance. The proxy feature is generated by adaptively attending to the hallucinated visual features and the source one based on the local neighbor information. On this basis, a graph built with the proxy representations is used for subsequent clustering operations. Extensive experiments show our proposed approach outperforms state-of-the-art face clustering methods with high inference efficiency.

引用

页码：20729 / 20738

页数：10

共 24 条

[1] CLIP-guided continual novel class discovery
Yana, Qingsen
Yang, Yiting
Dai, Yutong
Zhang, Xing
Wiltos, Katarzyna
Wozniak, Marcin
Dong, Wei
Zhang, Yanning
KNOWLEDGE-BASED SYSTEMS, 2025, 310
[2] Image-Based CLIP-Guided Essence Transfer
Chefer, Hila
Benaim, Sagie
Paiss, Roni
Wolf, Lior
COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 695 - 711
[3] Multimodal Fake News Detection via CLIP-Guided Learning
Zhou, Yangming
Yang, Yuzhou
Ying, Qichao
Qian, Zhenxing
Zhang, Xinpeng
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2825 - 2830
[4] StyleGAN-based CLIP-guided Image Shape Manipulation
Qian, Yuchen
Yamamoto, Kohei
Yanai, Keiji
19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 162 - 166
[5] CLIP-Guided Federated Learning on Heterogeneous and Long-Tailed Data
Shi, Jiangming
Zheng, Shanshan
Yin, Xiangbo
Lu, Yang
Xie, Yuan
Qu, Yanyun
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14955 - 14963
[6] CLIP-guided black-box domain adaptation of image classification
Tian, Liang
Ye, Mao
Zhou, Lihua
He, Qichen
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (05) : 4637 - 4646
[7] StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators
Gal, Rinon
Patashnik, Or
Maron, Haggai
Bermano, Amit H.
Chechik, Gal
Cohen-Or, Daniel
ACM TRANSACTIONS ON GRAPHICS, 2022, 41 (04):
[8] RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
Gaintseva, Tatiana
Benning, Martin
Slabaugh, Gregory
COMPUTER VISION - ECCV 2024, PT LXXIX, 2025, 15137 : 412 - 428
[9] CLIP-guided Prototype Modulating for Few-shot Action Recognition
Wang, Xiang
Zhang, Shiwei
Cen, Jun
Gao, Changxin
Zhang, Yingya
Zhao, Deli
Sang, Nong
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (06) : 1899 - 1912
[10] CgT-GAN: CLIP-guided Text GAN for Image Captioning
Yu, Jiarui
Li, Haoran
Hao, Yanbin
Zhu, Bin
Xu, Tong
He, Xiangnan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263

← 1 2 3 →