Reclaiming the Digital Commons: A Public Data Trust for Training Data

被引:3
|
作者
Chan, Alan [1 ]
Bradley, Herbie [2 ]
Rajkumar, Nitarshan [3 ]
机构
[1] Univ Montreal, Montreal, PQ, Canada
[2] Univ Cambridge, EleutherAI, Cambridge, England
[3] Univ Cambridge, Cambridge, England
关键词
data trust; training data; data rights; digital commons;
D O I
10.1145/3600211.3604658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the development of increasingly advanced AI systems, including degradation of the digital commons and unemployment from automation. The rapid pace of AI development and deployment currently leaves little room for this power. Monopolized in the hands of private corporations, the development of the most capable foundation models has proceeded largely without public input. There is currently no implemented mechanism for ensuring that the economic value generated by such models is redistributed to account for their negative externalities. The citizens that have generated the data necessary to train models do not have input on how their data are to be used. In this work, we propose that a public data trust assert control over training data for foundation models. In particular, this trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment. First, we argue in detail for the existence of such a trust. We also discuss feasibility and potential risks. Second, we detail a number of ways for a data trust to incentivize model developers to use training data only from the trust. We propose a mix of verification mechanisms, potential regulatory action, and positive incentives. We conclude by highlighting other potential benefits of our proposed data trust and connecting our work to ongoing efforts in data and compute governance.
引用
收藏
页码:855 / 868
页数:14
相关论文
共 50 条
  • [31] Trust consistency in public data games on complex networks
    Meng Li
    Zengru Di
    Wenqi Liu
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 2917 - 2932
  • [32] Improving Scientific Publications and Public Trust by Data Access
    Donald F Klein
    Neuropsychopharmacology, 2002, 26 : 696 - 697
  • [33] Big health data: the need to earn public trust
    van Staa, Tjeerd-Pieter
    Goldacre, Ben
    Buchan, Iain
    Smeeth, Liam
    BMJ-BRITISH MEDICAL JOURNAL, 2016, 354
  • [34] Reclaiming public trust in the wake of recent corporate accountability failures
    David M. Walker
    International Journal of Disclosure and Governance, 2005, 2 (3) : 264 - 271
  • [35] Know thy sensor: Trust, data quality, and data integrity in scientific digital libraries
    Wallis, Jillian C.
    Borgman, Christine L.
    Mayernik, Matthew S.
    Pepe, Alberto
    Ramanathan, Nithya
    Hansen, Mark
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 2007, 4675 : 380 - +
  • [36] Reclaiming public trust in the wake of recent corporate accountability failures
    Walker, David M.
    INTERNATIONAL JOURNAL OF DISCLOSURE AND GOVERNANCE, 2005, 2 (03) : 264 - 271
  • [37] Correction to: Data-driven research and healthcare: public trust, data governance and the NHS
    Angeliki Kerasidou
    Charalampia (Xaroula) Kerasidou
    BMC Medical Ethics, 24
  • [38] Datenschutz und Datensicherheit in Digital Public HealthDigital public health: data protection and data security
    Thomas Kunz
    Benjamin Lange
    Annika Selzer
    Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz, 2020, 63 (2) : 206 - 214
  • [39] RAMSDIS in digital satellite data training and analysis
    Molenar, D
    Schrab, KJ
    Purdom, JFW
    Gosden, H
    12TH INTERNATIONAL CONFERENCE ON INTERACTIVE INFORMATION AND PROCESSING SYSTEMS (IIPS) FOR METEOROLOGY, OCEANOGRAPHY, AND HYDROLOGY: JOINT SESSION WITH FIFTH SYMPOSIUM ON EDUCATION, 1996, : 160 - 163
  • [40] DEVELOPMENT OF PUBLIC DIGITAL DATA NETWORK IN JAPAN
    MIMA, Y
    NEC RESEARCH & DEVELOPMENT, 1977, (46): : 42 - 44