Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture

被引:0
|
作者
Klein, Jonathan [1 ]
Waller, Rebekah [2 ]
Pirk, Soeren [3 ]
Palubicki, Wojtek [4 ]
Tester, Mark [2 ]
Michels, Dominik L. [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Computat Sci Grp, Thuwal, Saudi Arabia
[2] King Abdullah Univ Sci & Technol KAUST, Ctr Desert Agr, Thuwal, Saudi Arabia
[3] Christian Albrechts Univ Kiel, Inst Comp Sci, Kiel, Germany
[4] Adam Mickiewicz Univ, Fac Math & Comp Sci, Poznan, Poland
来源
关键词
artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants;
D O I
10.3389/fpls.2024.1360113
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Synthetic data enable experiments in atomistic machine learning
    Gardner, John L. A.
    Beaulieu, Zoe Faure
    Deringer, Volker L.
    DIGITAL DISCOVERY, 2023, 2 (03): : 651 - 662
  • [22] Synthetic data as an enabler for machine learning applications in medicine
    Rajotte, Jean-Francois
    Bergen, Robert
    Buckeridge, David L.
    El Emam, Khaled
    Ng, Raymond
    Strome, Elissa
    ISCIENCE, 2022, 25 (11)
  • [23] Development potential of nanoenabled agriculture projected using machine learning
    Deng, Peng
    Gao, Yiming
    Mu, Li
    Hu, Xiangang
    Yu, Fubo
    Jia, Yuying
    Wang, Zhenyu
    Xing, Baoshan
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (25)
  • [24] Using Machine Learning and Molecular Docking to Leverage Urease Inhibition Data for Virtual Screening
    Aniceto, Natalia
    Albuquerque, Tania S.
    Bonifacio, Vasco D. B.
    Guedes, Rita C.
    Martinho, Nuno
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (09)
  • [25] A Machine Learning Model for Data Sanitization
    Ahmed, Usman
    Srivastava, Gautam
    Lin, Jerry Chun-Wei
    COMPUTER NETWORKS, 2021, 189
  • [26] Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing
    Rankin, Debbie
    Black, Michaela
    Bond, Raymond
    Wallace, Jonathan
    Mulvenna, Maurice
    Epelde, Gorka
    JMIR MEDICAL INFORMATICS, 2020, 8 (07)
  • [27] Combining data assimilation and machine learning to estimate parameters of a convective-scale model
    Legler, S.
    Janjic, T.
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2022, 148 (743) : 860 - 874
  • [28] Synthetic data generation for machine learning model training for energy theft scenarios using cosimulation
    Narayanan, Anantha
    Hardy, Trevor
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2023, 17 (05) : 1035 - 1046
  • [29] Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development
    Correia, Joao Lucas
    Pereira, Juliana Alves
    Mello, Rafael
    Garcia, Alessandro
    Fonseca, Baldoino
    Ribeiro, Marcio
    Gheyi, Rohit
    Kalinowski, Marcos
    Cerqueira, Renato
    Tiengo, Willy
    PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON SOFTWARE QUALITY, SBOS 2020, 2020,
  • [30] Development of a Machine Learning model to detect Keratoconus using Corneal Biomechanics data
    Civiero, Gabriele
    Naroo, Shehzad
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)