Social Value Alignment in Large Language Models

被引:0
|
作者
Abbol, Giulio Antonio [1 ]
Marchesi, Serena [2 ]
Wykowska, Agnieszka [2 ]
Belpaeme, Tony [1 ]
机构
[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium
[2] S4HRI Ist Italiano Tecnol, Genoa, Italy
关键词
Values; Large Language Models; LLM; Alignment; MIND;
D O I
10.1007/978-3-031-58202-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 50 条
  • [1] FUNDAMENTAL LIMITATIONS OF ALIGNMENT IN LARGE LANGUAGE MODELS
    Wolf, Yotam
    Wies, Noam
    Avnery, Oshri
    Levine, Yoav
    Shashua, Amnon
    [J]. arXiv, 2023,
  • [2] Cultural bias and cultural alignment of large language models
    Tao, Yan
    Viberg, Olga
    Baker, Ryan S.
    Kizilcec, Rene F.
    [J]. PNAS NEXUS, 2024, 3 (09):
  • [3] Strong and weak alignment of large language models with human values
    Khamassi, Mehdi
    Nahon, Marceau
    Chatila, Raja
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [4] The benefits, risks and bounds of personalizing the alignment of large language models to individuals
    Kirk, Hannah Rose
    Vidgen, Bertie
    Rottger, Paul
    Hale, Scott A.
    [J]. NATURE MACHINE INTELLIGENCE, 2024, 6 (04) : 383 - 392
  • [5] The Social Opportunities and Challenges in the Era of Large Language Models
    Huimin, Chen
    Zhiyuan, Liu
    Maosong, Sun
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1094 - 1103
  • [6] Pipelines for Social Bias Testing of Large Language Models
    Nozza, Debora
    Bianchi, Federico
    Hovy, Dirk
    [J]. PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), 2022, : 68 - 74
  • [7] Can Large Language Models Transform Computational Social Science?
    Ziems, Caleb
    Held, William
    Shaikh, Omar
    Chen, Jiaao
    Zhang, Zhehao
    Yang, Diyi
    [J]. COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 237 - 291
  • [8] Using Large Language Models to Shape Social Robots' Speech
    Sevilla-Salcedo, Javier
    Fernandez-Rodicio, Enrique
    Martin-Galvan, Laura
    Castro-Gonzalez, Alvaro
    Castillo, Jose C.
    Salichs, Miguel A.
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023, 8 (03): : 6 - 20
  • [9] Voices from the algorithm: Large language models in social research
    Cox, Emily
    Shirani, Fiona
    Rouse, Paul
    [J]. ENERGY RESEARCH & SOCIAL SCIENCE, 2024, 113
  • [10] AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment Enabled by Large Language Models
    Zhang, Rui
    Su, Yixin
    Trisedya, Bayu Distiawan
    Zhao, Xiaoyan
    Yang, Min
    Cheng, Hong
    Qi, Jianzhong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (06) : 2357 - 2371