Machine learning and the politics of synthetic data

被引:26
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Comparison of Machine Learning Models Trained on Synthetic Radiomic and Clinical Data
    Tu, Lorna
    Choi, Herve H. F.
    Clark, Haley
    Lloyd, Samantha A. M.
    MEDICAL PHYSICS, 2022, 49 (08) : 5689 - 5690
  • [22] UTILIZING SYNTHETIC DATA FOR VV&C OF MACHINE LEARNING APPLICATIONS
    Fox, Kevin L.
    Niewoehner, Kevin R.
    Rahmes, Mark D.
    Razdan, Rahul
    2022 INTEGRATED COMMUNICATION, NAVIGATION AND SURVEILLANCE CONFERENCE (ICNS), 2022,
  • [23] Machine learning for ULCF life prediction of structural steels with synthetic data
    Yu, Mingming
    Li, Shuailing
    Xie, Xu
    JOURNAL OF CONSTRUCTIONAL STEEL RESEARCH, 2025, 224
  • [24] Conditional Synthetic Data Generation for Robust Machine Learning Applications with Limited Pandemic Data
    Das, Hari Prasanna
    Tran, Ryan
    Singh, Japjot
    Yue, Xiangyu
    Tison, Geoffrey
    Sangiovanni-Vincentelli, Alberto
    Spanos, Costas J.
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11792 - 11800
  • [25] Politics of data reuse in machine learning systems: Theorizing reuse entanglements
    Thylstrup, Nanna Bonde
    Hansen, Kristian Bondo
    Flyverbom, Mikkel
    Amoore, Louise
    BIG DATA & SOCIETY, 2022, 9 (02)
  • [26] Machine Learning and Synthetic Minority Oversampling Techniques for Imbalanced Data: Improving Machine Failure Prediction
    Wah, Yap Bee
    Ismail, Azlan
    Azid, Nur Niswah Naslina
    Jaafar, Jafreezal
    Aziz, Izzatdin Abdul
    Hasan, Mohd Hilmi
    Zain, Jasni Mohamad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4821 - 4841
  • [27] MLReal: Bridging the gap between training on synthetic data and real data applications in machine learning
    Alkhalifah, Tariq
    Wang, Hanchen
    Ovcharenko, Oleg
    ARTIFICIAL INTELLIGENCE IN GEOSCIENCES, 2022, 3 : 101 - 114
  • [28] Measuring the Effect of Categorical Encoders in Machine Learning Tasks Using Synthetic Data
    Valdez-Valenzuela, Eric
    Kuri-Morales, Angel
    Gomez-Adorno, Helena
    ADVANCES IN COMPUTATIONAL INTELLIGENCE (MICAI 2021), PT I, 2021, 13067 : 92 - 107
  • [29] Deep Aramaic: Towards a synthetic data paradigm enabling machine learning in epigraphy
    Aioanei, Andrei C.
    Hunziker-Rodewald, Regine R.
    Klein, Konstantin M.
    Michels, Dominik L.
    PLOS ONE, 2024, 19 (04):
  • [30] Imperfection Sensitivity Detection in Pultruded Columns Using Machine Learning and Synthetic Data
    Tzimas, Michail
    Barbero, Ever J.
    BUILDINGS, 2024, 14 (04)