Machine learning and the politics of synthetic data

被引:26
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Machine Learning, Synthetic Data, and the Politics of Difference
    Jacobsen, Benjamin N.
    THEORY CULTURE & SOCIETY, 2025,
  • [2] Synthetic data in machine learning for medicine and healthcare
    Richard J. Chen
    Ming Y. Lu
    Tiffany Y. Chen
    Drew F. K. Williamson
    Faisal Mahmood
    Nature Biomedical Engineering, 2021, 5 : 493 - 497
  • [3] Synthetic satellite telemetry data for machine learning
    Schefels, Clemens
    Schlag, Leonard
    Helmsauer, Kathrin
    CEAS SPACE JOURNAL, 2025,
  • [4] Synthetic data in machine learning for medicine and healthcare
    Chen, Richard J.
    Lu, Ming Y.
    Chen, Tiffany Y.
    Williamson, Drew F. K.
    Mahmood, Faisal
    NATURE BIOMEDICAL ENGINEERING, 2021, 5 (06) : 493 - 497
  • [5] A Survey of Synthetic Data Generation for Machine Learning
    Abufadda, Mohammad
    Mansour, Khalid
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 488 - 494
  • [6] AUTOMATED MACHINE LEARNING & SYNTHETIC DATA APPLICATIONS IN MEDICINE
    Rashidi, Hooman
    INTERNATIONAL JOURNAL OF LABORATORY HEMATOLOGY, 2023, 45 : 93 - 93
  • [7] Synthetic data enable experiments in atomistic machine learning
    Gardner, John L. A.
    Beaulieu, Zoe Faure
    Deringer, Volker L.
    DIGITAL DISCOVERY, 2023, 2 (03): : 651 - 662
  • [8] Synthetic data as an enabler for machine learning applications in medicine
    Rajotte, Jean-Francois
    Bergen, Robert
    Buckeridge, David L.
    El Emam, Khaled
    Ng, Raymond
    Strome, Elissa
    ISCIENCE, 2022, 25 (11)
  • [9] Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards
    Tay, Sebastian Shenghong
    Xu, Xinyi
    Foo, Chuan Sheng
    Low, Bryan Kian Hsiang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9448 - 9456