Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture

被引:0
|
作者
Klein, Jonathan [1 ]
Waller, Rebekah [2 ]
Pirk, Soeren [3 ]
Palubicki, Wojtek [4 ]
Tester, Mark [2 ]
Michels, Dominik L. [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Computat Sci Grp, Thuwal, Saudi Arabia
[2] King Abdullah Univ Sci & Technol KAUST, Ctr Desert Agr, Thuwal, Saudi Arabia
[3] Christian Albrechts Univ Kiel, Inst Comp Sci, Kiel, Germany
[4] Adam Mickiewicz Univ, Fac Math & Comp Sci, Poznan, Poland
来源
关键词
artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants;
D O I
10.3389/fpls.2024.1360113
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Performances of Machine Learning Algorithms in Predicting the Productivity of Conservation Agriculture at a Global Scale
    Su, Yang
    Zhang, Huang
    Gabrielle, Benoit
    Makowski, David
    FRONTIERS IN ENVIRONMENTAL SCIENCE, 2022, 10
  • [32] Incentivizing Collaboration in Machine Learning via Synthetic Data Rewards
    Tay, Sebastian Shenghong
    Xu, Xinyi
    Foo, Chuan Sheng
    Low, Bryan Kian Hsiang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9448 - 9456
  • [34] On the Utility of Synthetic Data: An Empirical Evaluation on Machine Learning Tasks
    Hittmeir, Markus
    Ekelhart, Andreas
    Mayer, Rudolf
    14TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY (ARES 2019), 2019,
  • [35] Evaluation of Synthetic Data for Privacy-Preserving Machine Learning
    Hittmeir, Markus
    Ekelhart, Andreas
    Mayer, Rudolf
    ERCIM NEWS, 2020, (123): : 30 - 31
  • [36] A MACHINE LEARNING MODEL BASED ON HETEROGENEOUS DATA
    Narbayeva, S. M.
    Tapeeva, S. K.
    Turarbek, A.
    Zhunusbaeva, S.
    JOURNAL OF MATHEMATICS MECHANICS AND COMPUTER SCIENCE, 2022, 114 (02): : 80 - 90
  • [37] Automating Model Search for Large Scale Machine Learning
    Sparks, Evan R.
    Talwalkar, Ameet
    Haas, Daniel
    Franklin, Michael J.
    Jordan, Michael I.
    Kraska, Tim
    ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, : 368 - 380
  • [38] A Machine Learning Analysis of Big Metabolomics Data for Classifying Depression: Model Development and Validation
    Ma, Simeng
    Xie, Xinhui
    Deng, Zipeng
    Wang, Wei
    Xiang, Dan
    Yao, Lihua
    Kang, Lijun
    Xu, Shuxian
    Wang, Huiling
    Wang, Gaohua
    Yang, Jun
    Liu, Zhongchun
    BIOLOGICAL PSYCHIATRY, 2024, 96 (01) : 44 - 56
  • [39] Development a Machine Learning Model to Prediction of Expanded Disability Status Scale in Multiple Sclerosis Patients
    Ozdogar, Asiye Tuba
    Emec, Murat
    Zengin, Ela
    Ozcanhan, Mehmet Hilal
    Ozakbas, Serkan
    MULTIPLE SCLEROSIS JOURNAL, 2024, 30 (03) : 203 - 203
  • [40] Development of predictive model for preterm dirth using machine learning with US CDC data
    Gulati, Jessica
    Khafif, Gloria
    DePaola, Rosalie
    Benedetto, Maria Teresa
    Cheon, Teresa
    Sawai, Mio
    Yang, Christine
    Anzai, Yuzuru
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2024, 230 (01) : S116 - S117