Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture

被引:0
|
作者
Klein, Jonathan [1 ]
Waller, Rebekah [2 ]
Pirk, Soeren [3 ]
Palubicki, Wojtek [4 ]
Tester, Mark [2 ]
Michels, Dominik L. [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Computat Sci Grp, Thuwal, Saudi Arabia
[2] King Abdullah Univ Sci & Technol KAUST, Ctr Desert Agr, Thuwal, Saudi Arabia
[3] Christian Albrechts Univ Kiel, Inst Comp Sci, Kiel, Germany
[4] Adam Mickiewicz Univ, Fac Math & Comp Sci, Poznan, Poland
来源
关键词
artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants;
D O I
10.3389/fpls.2024.1360113
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Synthetic Data to Simplify Development of Machine Learning Models in Medical Imaging
    Schilder, L. P.
    Vendel, B. N.
    Hiemstra, P. H.
    Van Dalen, J. A.
    Hakvoort, G. A.
    Van Dijk, J. D.
    EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 : S192 - S193
  • [2] A Banking Platform to Leverage Data Driven Marketing with Machine Learning
    Torrens, Marc
    Tabakovic, Amir
    ENTROPY, 2022, 24 (03)
  • [3] Lattice: A Vision for Machine Learning, Data Engineering, and Policy Considerations for Digital Agriculture at Scale
    Chaterji, Somali
    DeLay, Nathan
    Evans, John
    Mosier, Nathan
    Engel, Bernard
    Buckmaster, Dennis
    Ladisch, Michael R.
    Chandra, Ranveer
    IEEE OPEN JOURNAL OF THE COMPUTER SOCIETY, 2021, 2 : 227 - 240
  • [4] Machine learning and the politics of synthetic data
    Jacobsen, Benjamin N.
    BIG DATA & SOCIETY, 2023, 10 (01)
  • [5] Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines
    Dong, Sijie
    Wang, Qitong
    Sahri, Soror
    Palpanas, Themis
    Srivastava, Divesh
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 3072 - 3081
  • [6] Development of a Deployment Strategy to Enable SMEs to Leverage Machine Learning Potential
    Savadogo M.
    Stonis M.
    ZWF Zeitschrift fuer Wirtschaftlichen Fabrikbetrieb, 2023, 118 (04): : 276 - 279
  • [7] Data Science and Machine Learning at Scale
    Sundaresan, Neel
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 10 - 10
  • [8] DEVELOPMENT OF GENERALIZED MACHINE LEARNING MODEL TO CLASSIFY POLSAR DATA
    Turkar, Varsha
    Masurkar, Akhil
    Das, Anup
    Daruwala, Rohin
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 7475 - 7478
  • [9] Synthetic data in machine learning for medicine and healthcare
    Richard J. Chen
    Ming Y. Lu
    Tiffany Y. Chen
    Drew F. K. Williamson
    Faisal Mahmood
    Nature Biomedical Engineering, 2021, 5 : 493 - 497
  • [10] Synthetic satellite telemetry data for machine learning
    Schefels, Clemens
    Schlag, Leonard
    Helmsauer, Kathrin
    CEAS SPACE JOURNAL, 2025,