Philosophical Investigations into AI Alignment: A Wittgensteinian Framework

被引：0

作者：

José Antonio Pérez-Escobar

Deniz Sarikaya

机构：

[1] École Normale Supérieure,Centre Cavaillès, UAR 3608 République Des Savoirs

[2] PSL University,Department of Philosophy

[3] Department of Logic,Centre for Logic and Philosophy of Science

[4] History and Philosophy of Science,Ethical Innovation Hub

[5] UNED,undefined

[6] University of Geneva,undefined

[7] Vrije Universiteit Brussel (VUB),undefined

[8] Universität zu Lübeck,undefined

来源：

Philosophy & Technology | 2024年 / 37卷 / 3期

关键词：

Later Wittgenstein; Alignment Problem; AI safety; Meaning as use; Rule bending;

D O I：

10.1007/s13347-024-00761-9

中图分类号：

学科分类号：

摘要：

We argue that the later Wittgenstein’s philosophy of language and mathematics, substantially focused on rule-following, is relevant to understand and improve on the Artificial Intelligence (AI) alignment problem: his discussions on the categories that influence alignment between humans can inform about the categories that should be controlled to improve on the alignment problem when creating large data sets to be used by supervised and unsupervised learning algorithms, as well as when introducing hard coded guardrails for AI models. We cast these considerations in a model of human–human and human–machine alignment and sketch basic alignment strategies based on these categories and further reflections on rule-following like the notion of meaning as use. To sustain the validity of these considerations, we also show that successful techniques employed by AI safety researchers to better align new AI systems with our human goals are congruent with the stipulations that we derive from the later Wittgenstein’s philosophy. However, their application may benefit from the added specificities and stipulations of our framework: it extends on the current efforts and provides further, specific AI alignment techniques. Thus, we argue that the categories of the model and the core alignment strategies presented in this work can inform further AI alignment techniques.

引用

共 50 条

[1] The Concept 'Horse' Paradox and Wittgensteinian Conceptual Investigations: A Prolegomenon to Philosophical Investigations
Floyd, Juliet
[J]. HISTORY AND PHILOSOPHY OF LOGIC, 2010, 31 (02) : 185 - 187
[2] A Note on “Philosophical Investigations into AI Alignment: A Wittgensteinean Framework” by J.A. Pérez-Escobar and D. Sarikaya
Sorin Bangu
[J]. Philosophy & Technology, 2024, 37 (3)
[3] Wittgensteinian developmental investigations
Shotter, J
[J]. BEHAVIORAL AND BRAIN SCIENCES, 2004, 27 (01) : 121 - +
[4] Philosophical wisdom and bioethical expertise: a wittgensteinian approach
Dall'Agnol, Darlei
[J]. REVISTA DE FILOSOFIA AURORA, 2022, 34 (63): : 51 - 67
[5] A Wittgensteinian Perspective on the Multidimensionality of Truth in the Community of Philosophical Inquiry
Alvarez-Abarejo, Cathlyne Joy
[J]. KRITIKE-AN ONLINE JOURNAL OF PHILOSOPHY, 2024, 18 (01) : 47 - 65
[6] Wittgensteinian pedagogics: Cavell on the figure of the child in the investigations
Peters M.
[J]. Studies in Philosophy and Education, 2001, 20 (2) : 125 - 138
[7] On the Infinite, In-Potentia: Discovery of the Hidden Revision of Philosophical Investigations and Its Relation to TS 209 Through the Eyes of Wittgensteinian Mathematics
Edwards-McKie, Susan
[J]. PHILOSOPHY OF LOGIC AND MATHEMATICS, 2020, 27 : 441 - 456
[8] A philosophical education and 'Philosophical Investigations'
Jolly, KD
[J]. MODERN SCHOOLMAN, 1999, 76 (04): : 293 - 301
[9] Bayesian deep learning: An enhanced AI framework for legal reasoning alignment
Zhang, Chuyue
Meng, Yuchen
[J]. Computer Law and Security Review, 2024, 55
[10] Philosophical Investigations
Florez, Alfonso
[J]. PENSAMIENTO, 2020, 76 (288): : 197 - 207

← 1 2 3 4 5 →