CLIP-Cluster: CLIP-Guided Attribute Hallucination for Face Clustering

被引:4
|
作者
Shen, Shuai [1 ,2 ]
Li, Wanhua [1 ,2 ]
Wang, Xiaobing [3 ]
Zhang, Dafeng [3 ]
Jin, Zhezhu [3 ]
Zhou, Jie [1 ,2 ]
Lu, Jiwen [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing, Peoples R China
[2] Beijing Natl Res Ctr Informat Sci & Technol, Beijing, Peoples R China
[3] Samsung Res China Beijing, Beijing, Peoples R China
来源
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/ICCV51070.2023.01900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the most important yet rarely studied challenges for supervised face clustering is the large intra-class variance caused by different face attributes such as age, pose, and expression. Images of the same identity but with different face attributes usually tend to be clustered into different sub-clusters. For the first time, we proposed an attribute hallucination framework named CLIP-Cluster to address this issue, which first hallucinates multiple representations for different attributes with the powerful CLIP model and then pools them by learning neighbor-adaptive attention. Specifically, CLIP-Cluster first introduces a text-driven attribute hallucination module, which allows one to use natural language as the interface to hallucinate novel attributes for a given face image based on the well-aligned image-language CLIP space. Furthermore, we develop a neighbor-aware proxy generator that fuses the features describing various attributes into a proxy feature to build a bridge among different sub-clusters and reduce the intra-class variance. The proxy feature is generated by adaptively attending to the hallucinated visual features and the source one based on the local neighbor information. On this basis, a graph built with the proxy representations is used for subsequent clustering operations. Extensive experiments show our proposed approach outperforms state-of-the-art face clustering methods with high inference efficiency.
引用
收藏
页码:20729 / 20738
页数:10
相关论文
共 24 条
  • [21] CMMF-Net: a generative network based on CLIP-guided multi-modal feature fusion for thermal infrared image colorization
    Jiang, Qian
    Zhou, Tao
    He, Youwei
    Ma, Wenjun
    Hou, Jingyu
    Ghani, Ahmad Shahrizan Abdul
    Miao, Shengfa
    Jin, Xin
    INTELLIGENCE & ROBOTICS, 2025, 5 (01): : 34 - 49
  • [22] Radio-guided vs clip-guided localization of nonpalpable mass-like lesions of the breast from a screened population: A propensity score-matched study
    Corsi, Fabio
    Bossi, Daniela
    Combi, Francesca
    Papadopoulou, Ourania
    Amadori, Rosella
    Regolo, Lea
    Trifiro, Giuseppe
    Albasini, Sara
    Mazzucchelli, Serena
    Sorrentino, Luca
    JOURNAL OF SURGICAL ONCOLOGY, 2019, 119 (07) : 916 - 924
  • [23] AGA-GAN: Attribute Guided Attention Generative Adversarial Network with U-Net for face hallucination
    Srivastava, Abhishek
    Chanda, Sukalpa
    Pal, Umapada
    IMAGE AND VISION COMPUTING, 2022, 126
  • [24] CLIP-PAE: Projection-Augmentation Embedding to Extract Relevant Features for a Disentangled, Interpretable and Controllable Text-Guided Face Manipulation
    Zhou, Chenliang
    Zhong, Fangcheng
    Oztireli, Cengiz
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,