Human Preference Score: Better Aligning Text-to-image Models with Human Preference

被引：11

作者：

Wu, Xiaoshi ^{[1
]}

Sun, Keqiang ^{[1
]}

Zhu, Feng ^{[2
]}

Zhao, Rui ^{[2
,3
]}

Li, Hongsheng ^{[1
,4
,5
]}

机构：

[1] Chinese Univ Hong Kong, Multimedia Lab, Hong Kong, Peoples R China

[2] Sensetime Res, Beijing, Peoples R China

[3] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China

[4] Ctr Perceptual & Interact Intelligence CPH, Shanghai, Peoples R China

[5] Shanghai AI Lab, Shanghai, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV | 2023年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/ICCV51070.2023.00200

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent years have witnessed a rapid growth of deep generative models, with text-to-image models gaining significant attention from the public. However, existing models often generate images that do not align well with human preferences, such as awkward combinations of limbs and facial expressions. To address this issue, we collect a dataset of human choices on generated images from the Stable Foundation Discord channel. Our experiments demonstrate that current evaluation metrics for generative models do not correlate well with human choices. Thus, we train a human preference classifier with the collected dataset and derive a Human Preference Score (HPS) based on the classifier. Using HPS, we propose a simple yet effective method to adapt Stable Diffusion to better align with human preferences. Our experiments show that HPS outperforms CLIP in predicting human choices and has good generalization capability toward images generated from other models. By tuning Stable Diffusion with the guidance of HPS, the adapted model is able to generate images that are more preferred by human users. The project page is available here: https://tgxs002.github.io/alignsd-web/.

引用

页码：2096 / 2105

页数：10

共 50 条

[41] EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Yang, Jingyuan
Feng, Jiawei
Huang, Hui
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 6358 - 6368
[42] Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval
Jiang, Ding
Ye, Mang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2787 - 2797
[43] Image Diversity Evaluation Metrics Correlated with Human Subjectivity and Prediction of Image Diversity in Text-to-image Synthesis
Okamoto, Natsuo
Shinagawa, Seitaro
Nakamura, Satoshi
Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (06)
[44] Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation
Tan, Zhaorui
Yang, Xi
Ye, Zihan
Wang, Qiufeng
Yan, Yuyao
Nguyen, Anh
Huang, Kaizhu
PATTERN RECOGNITION, 2023, 144
[45] Negative Capabilities: Investigating Apophasis in AI Text-to-Image Models
Lucas, Hannah
RELIGIONS, 2023, 14 (06)
[46] BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
Vice, Jordan
Akhtar, Naveed
Hartley, Richard
Mian, Ajmal
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 4865 - 4880
[47] Text-to-Image Diffusion Models are Zero-Shot Classifiers
Clark, Kevin
Jaini, Priyank
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] Example-Based Conditioning for Text-to-Image Generative Models
Takada, Atsushi
Kawabe, Wataru
Sugano, Yusuke
IEEE ACCESS, 2024, 12 : 162191 - 162203
[49] SUBTYPES OF HUMAN HAND PREFERENCE
SATZ, P
ORSINI, DL
SOPPER, HV
JOURNAL OF CLINICAL AND EXPERIMENTAL NEUROPSYCHOLOGY, 1985, 7 (06) : 602 - 602
[50] Human Preference for individual colors
Palmer, Stephen E.
Schloss, Karen B.
HUMAN VISION AND ELECTRONIC IMAGING XV, 2010, 7527

← 1 2 3 4 5 →