CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

被引：4

作者：

Shen, Shuai ^{[1
,2
]}

Li, Wanhua ^{[1
,2
]}

Wang, Xiaobing ^{[3
]}

Zhang, Dafeng ^{[3
]}

Jin, Zhezhu ^{[3
]}

Zhou, Jie ^{[1
,2
]}

Lu, Jiwen ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China

[2] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China

[3] Samsung Res China Beijing, Beijing, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/ICCV51070.2023.01900

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the most important yet rarely studied challenges for supervised face clustering is the large intra-class variance caused by different face attributes such as age, pose, and expression. Images of the same identity but with different face attributes usually tend to be clustered into different sub-clusters. For the first time, we proposed an attribute hallucination framework named CLIP-Cluster to address this issue, which first hallucinates multiple representations for different attributes with the powerful CLIP model and then pools them by learning neighbor-adaptive attention. Specifically, CLIP-Cluster first introduces a text-driven attribute hallucination module, which allows one to use natural language as the interface to hallucinate novel attributes for a given face image based on the well-aligned image-language CLIP space. Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance. The proxy feature is generated by adaptively attending to the hallucinated visual features and the source one based on the local neighbor information. On this basis, a graph built with the proxy representations is used for subsequent clustering operations. Extensive experiments show our proposed approach outperforms state-of-the-art face clustering methods with high inference efficiency.

引用

页码：20729 / 20738

页数：10

共 24 条

[21] CMMF-Net: a generative network based on CLIP-guided multi-modal feature fusion for thermal infrared image colorization
Jiang, Qian
Zhou, Tao
He, Youwei
Ma, Wenjun
Hou, Jingyu
Ghani, Ahmad Shahrizan Abdul
Miao, Shengfa
Jin, Xin
INTELLIGENCE & ROBOTICS, 2025, 5 (01): : 34 - 49
[22] Radio-guided vs clip-guided localization of nonpalpable mass-like lesions of the breast from a screened population: A propensity score-matched study
Corsi, Fabio
Bossi, Daniela
Combi, Francesca
Papadopoulou, Ourania
Amadori, Rosella
Regolo, Lea
Trifiro, Giuseppe
Albasini, Sara
Mazzucchelli, Serena
Sorrentino, Luca
JOURNAL OF SURGICAL ONCOLOGY, 2019, 119 (07) : 916 - 924
[23] AGA-GAN: Attribute Guided Attention Generative Adversarial Network with U-Net for face hallucination
Srivastava, Abhishek
Chanda, Sukalpa
Pal, Umapada
IMAGE AND VISION COMPUTING, 2022, 126
[24] CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable and Controllable Text-Guided Face Manipulation
Zhou, Chenliang
Zhong, Fangcheng
Oztireli, Cengiz
PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,

← 1 2 3 →