Human Preference Score: Better Aligning Text-to-image Models with Human Preference

被引:11
|
作者
Wu, Xiaoshi [1 ]
Sun, Keqiang [1 ]
Zhu, Feng [2 ]
Zhao, Rui [2 ,3 ]
Li, Hongsheng [1 ,4 ,5 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Sensetime Res, Beijing, Peoples R China
[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China
[4] Ctr Perceptual & Interact Intelligence CPH, Shanghai, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.00200
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. Our experiments show that HPS outperforms CLIP in predicting human choices and has good generalization capability toward images generated from other models. By tuning Stable Diffusion with the guidance of HPS, the adapted model is able to generate images that are more preferred by human users. The project page is available here: https://tgxs002.github.io/alignsd-web/.
引用
收藏
页码:2096 / 2105
页数:10
相关论文
共 50 条
  • [41] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
    Yang, Jingyuan
    Feng, Jiawei
    Huang, Hui
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
  • [42] Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
    Jiang, Ding
    Ye, Mang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2787 - 2797
  • [43] Image Diversity Evaluation Metrics Correlated with Human Subjectivity and Prediction of Image Diversity in Text-to-image Synthesis
    Okamoto, Natsuo
    Shinagawa, Seitaro
    Nakamura, Satoshi
    Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (06)
  • [44] Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation
    Tan, Zhaorui
    Yang, Xi
    Ye, Zihan
    Wang, Qiufeng
    Yan, Yuyao
    Nguyen, Anh
    Huang, Kaizhu
    PATTERN RECOGNITION, 2023, 144
  • [45] Negative Capabilities: Investigating Apophasis in AI Text-to-Image Models
    Lucas, Hannah
    RELIGIONS, 2023, 14 (06)
  • [46] BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
    Vice, Jordan
    Akhtar, Naveed
    Hartley, Richard
    Mian, Ajmal
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4865 - 4880
  • [47] Text-to-Image Diffusion Models are Zero-Shot Classifiers
    Clark, Kevin
    Jaini, Priyank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Example-Based Conditioning for Text-to-Image Generative Models
    Takada, Atsushi
    Kawabe, Wataru
    Sugano, Yusuke
    IEEE ACCESS, 2024, 12 : 162191 - 162203
  • [49] SUBTYPES OF HUMAN HAND PREFERENCE
    SATZ, P
    ORSINI, DL
    SOPPER, HV
    JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 1985, 7 (06) : 602 - 602
  • [50] Human Preference for individual colors
    Palmer, Stephen E.
    Schloss, Karen B.
    HUMAN VISION AND ELECTRONIC IMAGING XV, 2010, 7527