Human Preference Score: Better Aligning Text-to-image Models with Human Preference

被引:11
|
作者
Wu, Xiaoshi [1 ]
Sun, Keqiang [1 ]
Zhu, Feng [2 ]
Zhao, Rui [2 ,3 ]
Li, Hongsheng [1 ,4 ,5 ]
机构
[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China
[2] Sensetime Res, Beijing, Peoples R China
[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China
[4] Ctr Perceptual & Interact Intelligence CPH, Shanghai, Peoples R China
[5] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/ICCV51070.2023.00200
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. Our experiments show that HPS outperforms CLIP in predicting human choices and has good generalization capability toward images generated from other models. By tuning Stable Diffusion with the guidance of HPS, the adapted model is able to generate images that are more preferred by human users. The project page is available here: https://tgxs002.github.io/alignsd-web/.
引用
收藏
页码:2096 / 2105
页数:10
相关论文
共 50 条
  • [1] Learning Multi-dimensional Human Preference for Text-to-Image Generation
    Zhang, Sixian
    Wang, Bohan
    Wu, Junqiang
    Li, Yan
    Gao, Tingting
    Zhang, Di
    Wang, Zhongyuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8018 - 8027
  • [2] Human image preference and document degradation models
    Hale, Chris
    Smith, Elisa H. Barney
    ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 257 - 261
  • [3] BATON: Aligning Text-to-Audio Model Using Human Preference Feedback
    Liao, Huan
    Han, Haonan
    Yang, Kai
    Du, Tianjiao
    Yang, Rui
    Xu, Qinmei
    Xu, Zunnan
    Liu, Jingquan
    Lu, Jiasheng
    Li, Xiu
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4542 - 4550
  • [4] Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
    Jung, Sanghyun
    Jung, Seohyeon
    Kim, Balhae
    Choi, Moonseok
    Shin, Jinwoo
    Lee, Juho
    COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 128 - 145
  • [5] Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
    Wang, Luozhou
    Shen, Guibao
    Ge, Wenhang
    Chen, Guangyong
    Li, Yijun
    Chen, Yingcong
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 21 - 37
  • [6] Open-Source Text-to-Image Models: Evaluation using Metrics and Human Perception
    Yamac, Aylin
    Genc, Dilan
    Zaman, Esra
    Gerschner, Felix
    Klaiber, Marco
    Theissler, Andreas
    2024 IEEE 48TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC 2024, 2024, : 1659 - 1664
  • [7] Holistic Evaluation of Text-to-Image Models
    Lee, Tony
    Yasunaga, Michihiro
    Meng, Chenlin
    Mai, Yifan
    Park, Joon Sung
    Gupta, Agrim
    Zhang, Yunzhi
    Narayanan, Deepak
    Teufel, Hannah Benita
    Bellagente, Marco
    Kang, Minguk
    Park, Taesung
    Leskovec, Jure
    Zhu, Jun-Yan
    Li Fei-Fei
    Wu, Jiajun
    Ermon, Stefano
    Liang, Percy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Debiasing Text-to-Image Diffusion Models
    He, Ruifei
    Xue, Chuhui
    Tan, Haoru
    Zhang, Wenqing
    Yu, Yingchen
    Bai, Song
    Qi, Xiaojuan
    PROCEEDINGS OF THE 1ST ACM MULTIMEDIA WORKSHOP ON MULTI-MODAL MISINFORMATION GOVERNANCE IN THE ERA OF FOUNDATION MODELS, MIS 2024, 2024, : 29 - 36
  • [9] Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
    Otani, Mayu
    Togashi, Riku
    Sawai, Yu
    Ishigami, Ryosuke
    Nakashima, Yuta
    Rahtu, Esa
    Heikkila, Janne
    Satoh, Shin'ichi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14277 - 14286
  • [10] ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
    Xu, Jiazheng
    Liu, Xiao
    Wu, Yuchen
    Tong, Yuxuan
    Li, Qinkai
    Ding, Ming
    Tang, Jie
    Dong, Yuxiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,