Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data

被引:8
|
作者
Balagopalan, Aparna [1 ]
Madras, David [2 ,3 ]
Yang, David H. [2 ,4 ]
Hadfield-Menell, Dylan [1 ]
Hadfield, Gillian K. [2 ,3 ,5 ,6 ,7 ]
Ghassemi, Marzyeh [1 ,2 ,3 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Univ Toronto, Toronto, ON, Canada
[3] Vector Inst, Toronto, ON, Canada
[4] ML Estimat, Toronto, ON, Canada
[5] Schwartz Reisman Inst Technol & Soc, Toronto, ON, Canada
[6] Ctr Human Compatible AI, Berkeley, CA USA
[7] OpenAI, San Francisco, CA USA
基金
加拿大自然科学与工程研究理事会;
关键词
Compilation and indexing terms; Copyright 2025 Elsevier Inc;
D O I
10.1126/sciadv.abq0701
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As governments and industry turn to increased use of automated decision systems, it becomes essential to consider how closely such systems can reproduce human judgment. We identify a core potential failure, finding that annotators label objects differently depending on whether they are being asked a factual question or a normative question. This challenges a natural assumption maintained in many standard machine-learning (ML) data acquisition procedures: that there is no difference between predicting the factual classification of an object and an exercise of judgment about whether an object violates a rule premised on those facts. We find that using factual labels to train models intended for normative judgments introduces a notable measurement error. We show that models trained using factual labels yield significantly different judgments than those trained using normative labels and that the impact of this effect on model performance can exceed that of other factors (e.g., dataset size) that routinely attract attention from ML researchers and practitioners.
引用
收藏
页数:14
相关论文
共 1 条