Machine learning and the politics of synthetic data

被引:26
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Machine Vision for Collaborative Robotics Using Synthetic Data-Driven Learning
    Camilo Martinez-Franco, Juan
    Alvarez-Martinez, David
    SERVICE ORIENTED, HOLONIC AND MULTI-AGENT MANUFACTURING SYSTEMS FOR INDUSTRY OF THE FUTURE, SOHOMA LATIN AMERICA 2021, 2021, 987 : 69 - 81
  • [42] Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
    Chen, Anjun
    Chen, Drake O.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [43] Improving the performance of machine learning penicillin adverse drug reaction classification with synthetic data and transfer learning
    Stanekova, Viera
    Inglis, Joshua M.
    Lam, Lydia
    Lam, Antoinette
    Smith, William
    Shakib, Sepehr
    Bacchi, Stephen
    INTERNAL MEDICINE JOURNAL, 2024, 54 (07) : 1183 - 1189
  • [44] Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data
    Anjun Chen
    Drake O. Chen
    Scientific Reports, 12
  • [45] Representativeness in Statistics, Politics, and Machine Learning
    Chasalow, Kyla
    Levy, Karen
    PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 77 - 89
  • [46] Combining Synthetic and Observed Data to Enhance Machine Learning Model Performance for Streamflow Prediction
    Lopez-Chacon, Sergio Ricardo
    Salazar, Fernando
    Blade, Ernest
    WATER, 2023, 15 (11)
  • [47] Early Prediction of Neonatal Sepsis From Synthetic Clinical Data Using Machine Learning
    Lyra, Simon
    Jin, Jinyi
    Leonhardt, Steffen
    Lueken, Markus
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [48] An Application of Machine Learning for Plasma Current Quench Studies via Synthetic Data Generation
    Dalsania, Niharika
    Patel, Zeel
    Purohit, Shishir
    Chaudhury, Bhaskar
    FUSION ENGINEERING AND DESIGN, 2021, 171
  • [49] Election forensics: Using machine learning and synthetic data for possible election anomaly detection
    Zhang, Mali
    Alvarez, R. Michael
    Levin, Ines
    PLOS ONE, 2019, 14 (10):
  • [50] Dynamics Modeling of Industrial Robotic Manipulators: A Machine Learning Approach Based on Synthetic Data
    Segota, Sandi Baressi
    Andelic, Nikola
    Sercer, Mario
    Mestric, Hrvoje
    MATHEMATICS, 2022, 10 (07)