Human Preference Score: Better Aligning Text-to-image Models with Human Preference

被引:11
|
作者
Wu, Xiaoshi [1 ]
Sun, Keqiang [1 ]
Zhu, Feng [2 ]
Zhao, Rui [2 ,3 ]
Li, Hongsheng [1 ,4 ,5 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Sensetime Res, Beijing, Peoples R China
[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China
[4] Ctr Perceptual & Interact Intelligence CPH, Shanghai, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.00200
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. Our experiments show that HPS outperforms CLIP in predicting human choices and has good generalization capability toward images generated from other models. By tuning Stable Diffusion with the guidance of HPS, the adapted model is able to generate images that are more preferred by human users. The project page is available here: https://tgxs002.github.io/alignsd-web/.
引用
收藏
页码:2096 / 2105
页数:10
相关论文
共 50 条
  • [21] Feature extraction of human face image for preference database
    Tachikawa, Yu
    Nozawa, Akio
    PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 83 - 86
  • [22] IntentTuner: An Interactive Framework for Integrating Human Intentions in Fine-tuning Text-to-Image Generative Models
    Zeng, Xingchen
    Gao, Ziyao
    Ye, Yilin
    Zeng, Wei
    PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024, 2024,
  • [23] SINE: SINgle Image Editing with Text-to-Image Diffusion Models
    Zhang, Zhixing
    Han, Ligong
    Ghosh, Arnab
    Metaxas, Dimitris
    Ren, Jian
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6027 - 6037
  • [24] Preference for human eyes in human infants
    Dupierrix, Eve
    de Boisferon, Anne Hillairet
    Meary, David
    Lee, Kang
    Quinn, Paul C.
    Di Giorgio, Elisa
    Simion, Francesca
    Tomonaga, Masaki
    Pascalis, Olivier
    JOURNAL OF EXPERIMENTAL CHILD PSYCHOLOGY, 2014, 123 : 138 - 146
  • [25] InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
    Hoe, Jiun Tian
    Jiang, Xudong
    Chan, Chee Seng
    Tan, Yap-Peng
    Hu, Weipeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6180 - 6189
  • [26] Advancements in adversarial generative text-to-image models: a review
    Zaghloul, Rawan
    Rawashdeh, Enas
    Bani-Ata, Tomader
    IMAGING SCIENCE JOURNAL, 2024,
  • [27] Towards Geographic Inclusion in the Evaluation of Text-to-Image Models
    Hall, Melissa
    Bell, Samuel J.
    Ross, Candace
    Williams, Adina
    Drozdzal, Michal
    Soriano, Adriana Romero
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 585 - 601
  • [28] Discriminative Class Tokens for Text-to-Image Diffusion Models
    Schwartz, Idan
    Snaebjarnarson, Vesteinn
    Chefer, Hila
    Belongie, Serge
    Wolf, Lior
    Benaim, Sagie
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22668 - 22678
  • [29] Adding Conditional Control to Text-to-Image Diffusion Models
    Zhang, Lvmin
    Rao, Anyi
    Agrawala, Maneesh
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3813 - 3824
  • [30] Paint to Better Describe: Learning Image Caption by Using Text-to-Image Synthesis
    Wang, Rongzhao
    Liu, Libo
    2021 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS DASC/PICOM/CBDCOM/CYBERSCITECH 2021, 2021, : 958 - 964