Machine learning and the politics of synthetic data

被引:26
|
作者
Jacobsen, Benjamin N. [1 ]
机构
[1] Univ Durham, Dept Geog, South Rd, Durham DH1 3LE, England
基金
欧洲研究理事会;
关键词
Machine learning; data; algorithms; risk; ethics; variability;
D O I
10.1177/20539517221145372
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are 'synthetic', not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Detecting Large Explosions With Machine Learning Models Trained on Synthetic Infrasound Data
    Witsil, Alex
    Fee, David
    Dickey, Joshua
    Pena, Raul
    Waxler, Roger
    Blom, Philip
    GEOPHYSICAL RESEARCH LETTERS, 2022, 49 (11)
  • [32] Exploring the use of machine learning techniques and synthetic data creation with CoCoBi dataset
    Pihlajamaki, Mika
    Silander, Kaisa
    Kantojarvi, Katri
    Eklund, Niina
    Wahlfors, Tiina
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 677 - 677
  • [33] Evaluation of Synthetic Video Data in Machine Learning Approaches for Parking Space Classification
    Horn, Daniela
    Houben, Sebastian
    2018 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2018, : 2157 - 2162
  • [34] Machine Learning-Aided Synthetic Air Data System for Commercial Aircraft
    Kilic, Ugur
    Cam, Omer
    Can, Erol
    JOURNAL OF AEROSPACE ENGINEERING, 2024, 37 (06)
  • [35] Copula-based synthetic data augmentation for machine-learning emulators
    Meyer, David
    Nagler, Thomas
    Hogan, Robin J.
    GEOSCIENTIFIC MODEL DEVELOPMENT, 2021, 14 (08) : 5205 - 5215
  • [36] Machine Learning Approaches for Prediction of Facial Rejuvenation Using Real and Synthetic Data
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    Molton, Michael K.
    IEEE ACCESS, 2019, 7 : 23779 - 23787
  • [37] Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture
    Klein, Jonathan
    Waller, Rebekah
    Pirk, Soeren
    Palubicki, Wojtek
    Tester, Mark
    Michels, Dominik L.
    FRONTIERS IN PLANT SCIENCE, 2024, 15
  • [38] Inverse Biomechanical Modeling of the Tongue via Machine Learning and Synthetic Training Data
    Tolpadi, Aniket A.
    Stone, Maureen L.
    Carass, Aaron
    Prince, Jerry L.
    Gomez, Arnold D.
    MEDICAL IMAGING 2018: IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, 2018, 10576
  • [39] Emergency Shutdown Valve damage classification by machine learning using synthetic data
    de Gouveia, S. M.
    Correa, L. de Abreu
    Teles, D. B.
    Oliveira, M.
    Clarke, T. G. R.
    ENGINEERING FAILURE ANALYSIS, 2024, 156
  • [40] Synthetic Data Generation to Mitigate the Low/No-Shot Problem in Machine Learning
    Berkson, Emily E.
    VanCor, Jared D.
    Esposito, Steven
    Chern, Gary
    Pritt, Mark
    2019 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2019,