Reclaiming the Digital Commons: A Public Data Trust for Training Data

被引:3
|
作者
Chan, Alan [1 ]
Bradley, Herbie [2 ]
Rajkumar, Nitarshan [3 ]
机构
[1] Univ Montreal, Montreal, PQ, Canada
[2] Univ Cambridge, EleutherAI, Cambridge, England
[3] Univ Cambridge, Cambridge, England
关键词
data trust; training data; data rights; digital commons;
D O I
10.1145/3600211.3604658
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Democratization of AI means not only that people can freely use AI, but also that people can collectively decide how AI is to be used. In particular, collective decision-making power is required to redress the negative externalities from the development of increasingly advanced AI systems, including degradation of the digital commons and unemployment from automation. The rapid pace of AI development and deployment currently leaves little room for this power. Monopolized in the hands of private corporations, the development of the most capable foundation models has proceeded largely without public input. There is currently no implemented mechanism for ensuring that the economic value generated by such models is redistributed to account for their negative externalities. The citizens that have generated the data necessary to train models do not have input on how their data are to be used. In this work, we propose that a public data trust assert control over training data for foundation models. In particular, this trust should scrape the internet as a digital commons, to license to commercial model developers for a percentage cut of revenues from deployment. First, we argue in detail for the existence of such a trust. We also discuss feasibility and potential risks. Second, we detail a number of ways for a data trust to incentivize model developers to use training data only from the trust. We propose a mix of verification mechanisms, potential regulatory action, and positive incentives. We conclude by highlighting other potential benefits of our proposed data trust and connecting our work to ongoing efforts in data and compute governance.
引用
收藏
页码:855 / 868
页数:14
相关论文
共 50 条
  • [1] The Public Trust in Data
    Huq, Aziz Z.
    GEORGETOWN LAW JOURNAL, 2021, 110 (02) : 333 - 402
  • [2] Logged out: Ownership, exclusion and public value in the digital data and information commons
    Prainsack, Barbara
    BIG DATA & SOCIETY, 2019, 6 (01):
  • [3] PUBLIC TRUST - RECLAIMING OR PRESERVING
    JOHNSON, RL
    JOHNSON, EA
    HEALTH CARE MANAGEMENT REVIEW, 1994, 19 (02) : 7 - 20
  • [4] Embedding European values in data governance: a case for public data commons
    Zygmuntowski, Jan J.
    Zoboli, Laura
    Nemitz, Paul F.
    INTERNET POLICY REVIEW, 2021, 10 (03):
  • [5] Public commons of geographic data: Research and development challenges
    Onsrud, H
    Camara, G
    Campbell, J
    Chakravarthy, NS
    GEOGRAPHIC INFORMATION SCIENCE, PROCEEDINGS, 2004, 3234 : 223 - 238
  • [6] Data commons
    van Maanen, Gijs
    Ducuing, Charlotte
    Fia, Tommaso
    INTERNET POLICY REVIEW, 2024, 13 (02):
  • [7] Exploding Data: Reclaiming Our Cyber Security in the Digital Age
    Grabowski, William
    LIBRARY JOURNAL, 2018, 143 (12) : 83 - 84
  • [8] Local data commons: the sleeping beauty in the community of data commons
    Jeong, Jong Cheol
    Hands, Isaac
    Kolesar, Jill M.
    Rao, Mahadev
    Davis, Bront
    Dobyns, York
    Hurt-Mueller, Joseph
    Levens, Justin
    Gregory, Jenny
    Williams, John
    Witt, Lisa
    Kim, Eun Mi
    Burton, Carlee
    Elbiheary, Amir A.
    Chang, Mingguang
    Durbin, Eric B.
    BMC BIOINFORMATICS, 2022, 23 (SUPPL 12)
  • [9] Local data commons: the sleeping beauty in the community of data commons
    Jong Cheol Jeong
    Isaac Hands
    Jill M. Kolesar
    Mahadev Rao
    Bront Davis
    York Dobyns
    Joseph Hurt-Mueller
    Justin Levens
    Jenny Gregory
    John Williams
    Lisa Witt
    Eun Mi Kim
    Carlee Burton
    Amir A. Elbiheary
    Mingguang Chang
    Eric B. Durbin
    BMC Bioinformatics, 23
  • [10] Digital public health: data protection and data security
    Kunz, Thomas
    Lange, Benjamin
    Selzer, Annika
    BUNDESGESUNDHEITSBLATT-GESUNDHEITSFORSCHUNG-GESUNDHEITSSCHUTZ, 2020, 63 (02) : 206 - 214