Reclaiming the Digital Commons: A Public Data Trust for Training Data

被引:3
|
作者
Chan, Alan [1 ]
Bradley, Herbie [2 ]
Rajkumar, Nitarshan [3 ]
机构
[1] Univ Montreal, Montreal, PQ, Canada
[2] Univ Cambridge, EleutherAI, Cambridge, England
[3] Univ Cambridge, Cambridge, England
关键词
data trust; training data; data rights; digital commons;
D O I
10.1145/3600211.3604658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the development of increasingly advanced AI systems, including degradation of the digital commons and unemployment from automation. The rapid pace of AI development and deployment currently leaves little room for this power. Monopolized in the hands of private corporations, the development of the most capable foundation models has proceeded largely without public input. There is currently no implemented mechanism for ensuring that the economic value generated by such models is redistributed to account for their negative externalities. The citizens that have generated the data necessary to train models do not have input on how their data are to be used. In this work, we propose that a public data trust assert control over training data for foundation models. In particular, this trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment. First, we argue in detail for the existence of such a trust. We also discuss feasibility and potential risks. Second, we detail a number of ways for a data trust to incentivize model developers to use training data only from the trust. We propose a mix of verification mechanisms, potential regulatory action, and positive incentives. We conclude by highlighting other potential benefits of our proposed data trust and connecting our work to ongoing efforts in data and compute governance.
引用
收藏
页码:855 / 868
页数:14
相关论文
共 50 条
  • [41] The Australian research data commons
    Barker M.
    Wilkinson R.
    Treloar A.
    Data Science Journal, 2019, 18 (01):
  • [42] NCI Imaging Data Commons
    Fedorov, Andrey
    Longabaugh, William J. R.
    Pot, David
    Clunie, David A.
    Pieper, Steve
    Aerts, Hugo J. W. L.
    Homeyer, Andre
    Lewis, Rob
    Akbarzadeh, Afshin
    Bontempi, Dennis
    Clifford, William
    Herrmann, Markus D.
    Hoefener, Henning
    Octaviano, Igor
    Osborne, Chad
    Paquette, Suzanne
    Petts, James
    Punzo, Davide
    Reyes, Madelyn
    Schacherer, Daniela P.
    Tian, Mi
    White, George
    Ziegler, Erik
    Shmulevich, Ilya
    Pihl, Todd
    Wagner, Ulrike
    Farahani, Keyvan
    Kikinis, Ron
    CANCER RESEARCH, 2021, 81 (16) : 4188 - 4193
  • [43] The Materials Commons Data Repository
    Tarcea, Glenn
    Puchala, Brian
    Berman, Tracy
    Scorzelli, Giorgio
    Pascucci, Valerio
    Taufer, Michela
    Allison, John
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON E-SCIENCE (ESCIENCE 2022), 2022, : 405 - 406
  • [44] Innovation Commons for the Data Economy
    Sara Guidi
    Digital Society, 2023, 2 (2):
  • [45] Sustaining the Data and Bioresource Commons
    Schofield, Paul N.
    Eppig, Janan
    Huala, Eva
    de Angelis, Martin Hrabe
    Harvey, Mark
    Davidson, Duncan
    Weaver, Tom
    Brown, Steve
    Smedley, Damian
    Rosenthal, Nadia
    Schughart, Klaus
    Aidinis, Vassilis
    Tocchini-Valentini, Glauco
    Hancock, John M.
    SCIENCE, 2010, 330 (6004) : 592 - 593
  • [46] The NCI Genomic Data Commons
    Allison P. Heath
    Vincent Ferretti
    Stuti Agrawal
    Maksim An
    James C. Angelakos
    Renuka Arya
    Rosita Bajari
    Bilal Baqar
    Justin H. B. Barnowski
    Jeffrey Burt
    Ann Catton
    Brandon F. Chan
    Fay Chu
    Kim Cullion
    Tanja Davidsen
    Phuong-My Do
    Christian Dompierre
    Martin L. Ferguson
    Michael S. Fitzsimons
    Michael Ford
    Miyuki Fukuma
    Sharon Gaheen
    Gajanan L. Ganji
    Tzintzuni I. Garcia
    Sameera S. George
    Daniela S. Gerhard
    Francois Gerthoffert
    Fauzi Gomez
    Kang Han
    Kyle M. Hernandez
    Biju Issac
    Richard Jackson
    Mark A. Jensen
    Sid Joshi
    Ajinkya Kadam
    Aishmit Khurana
    Kyle M. J. Kim
    Victoria E. Kraft
    Shenglai Li
    Tara M. Lichtenberg
    Janice Lodato
    Laxmi Lolla
    Plamen Martinov
    Jeffrey A. Mazzone
    Daniel P. Miller
    Ian Miller
    Joshua S. Miller
    Koji Miyauchi
    Mark W. Murphy
    Thomas Nullet
    Nature Genetics, 2021, 53 : 257 - 262
  • [47] The NCI Genomic Data Commons
    Heath, Allison P.
    Ferretti, Vincent
    Agrawal, Stuti
    An, Maksim
    Angelakos, James C.
    Arya, Renuka
    Bajari, Rosita
    Baqar, Bilal
    Barnowski, Justin H. B.
    Burt, Jeffrey
    Catton, Ann
    Chan, Brandon F.
    Chu, Fay
    Cullion, Kim
    Davidsen, Tanja
    Do, Phuong-My
    Dompierre, Christian
    Ferguson, Martin L.
    Fitzsimons, Michael S.
    Ford, Michael
    Fukuma, Miyuki
    Gaheen, Sharon
    Ganji, Gajanan L.
    Garcia, Tzintzuni I.
    George, Sameera S.
    Gerhard, Daniela S.
    Gerthoffert, Francois
    Gomez, Fauzi
    Han, Kang
    Hernandez, Kyle M.
    Issac, Biju
    Jackson, Richard
    Jensen, Mark A.
    Joshi, Sid
    Kadam, Ajinkya
    Khurana, Aishmit
    Kim, Kyle M. J.
    Kraft, Victoria E.
    Li, Shenglai
    Lichtenberg, Tara M.
    Lodato, Janice
    Lolla, Laxmi
    Martinov, Plamen
    Mazzone, Jeffrey A.
    Miller, Daniel P.
    Miller, Ian
    Miller, Joshua S.
    Miyauchi, Koji
    Murphy, Mark W.
    Nullet, Thomas
    NATURE GENETICS, 2021, 53 (03) : 257 - 262
  • [48] A Case for Data Commons: Toward Data Science as a Service
    Grossman, Robert L.
    Heath, Allison
    Murphy, Mark
    Patterson, Maria
    Wells, Walt
    COMPUTING IN SCIENCE & ENGINEERING, 2016, 18 (05) : 10 - 20
  • [49] Genomic Data Commons Expands
    不详
    JOURNAL OF NUCLEAR MEDICINE, 2016, 57 (09) : 21N - 21N
  • [50] NCI Imaging Data Commons
    Fedorov, A.
    Longabaugh, W.
    Pot, D.
    Clunie, D.
    Pieper, S.
    Lewis, R.
    Aerts, H.
    Homeyer, A.
    Herrmann, M.
    Wagner, U.
    Pihl, T.
    Farahani, K.
    Kikinis, R.
    MEDICAL PHYSICS, 2021, 48 (06)