Synthetic data at scale: a development model to efficiently leverage machine learning in agriculture

被引:0
|
作者
Klein, Jonathan [1 ]
Waller, Rebekah [2 ]
Pirk, Soeren [3 ]
Palubicki, Wojtek [4 ]
Tester, Mark [2 ]
Michels, Dominik L. [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Computat Sci Grp, Thuwal, Saudi Arabia
[2] King Abdullah Univ Sci & Technol KAUST, Ctr Desert Agr, Thuwal, Saudi Arabia
[3] Christian Albrechts Univ Kiel, Inst Comp Sci, Kiel, Germany
[4] Adam Mickiewicz Univ, Fac Math & Comp Sci, Poznan, Poland
来源
关键词
artificial intelligence; data generation and annotation; disease detection; greenhouse farming; machine learning; synthetic data; tomato plants;
D O I
10.3389/fpls.2024.1360113
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data. We believe that our approach will help to reduce the cost of synthetic data generation in future applications.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Data Mining Technology with Fuzzy Logic, Neural Networks and Machine Learning for Agriculture
    Kale, Shivani S.
    Patil, Preeti S.
    DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2018, VOL 2, 2019, 839 : 79 - 87
  • [42] Editorial: Recent advances in big data, machine, and deep learning for precision agriculture
    Wozniak, Marcin
    Ijaz, Muhammad Fazal
    FRONTIERS IN PLANT SCIENCE, 2024, 15
  • [43] A Comparative Study and Machine Learning Enabled Efficient Classification for Multispectral Data in Agriculture
    Gupta, Priyanka
    Kanga, Shruti
    Mishra, Varun Narayan
    Singh, Suraj Kumar
    Sivasankar, Thota
    BAGHDAD SCIENCE JOURNAL, 2024, 21 (07) : 2462 - 2484
  • [44] Machine Learning Model for Vaccine Development: A Perspective
    Dubey, Anubha
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (02): : 762 - 769
  • [45] Synthetic Data Generation With Machine Learning for Network Intrusion Detection Systems
    Newlin, Marvin
    Reith, Mark
    DeYoung, Mark
    PROCEEDINGS OF THE 18TH EUROPEAN CONFERENCE ON CYBER WARFARE AND SECURITY (ECCWS 2019), 2019, : 785 - 789
  • [46] When Machine Learning Models Leak: An Exploration of Synthetic Training Data
    Slokom, Manel
    De Wolf, Peter-Paul
    Larson, Martha
    PRIVACY IN STATISTICAL DATABASES, PSD 2022, 2022, 13463 : 283 - 296
  • [47] Cotton Yield Prediction: A Machine Learning Approach With Field and Synthetic Data
    Mitra, Alakananda
    Beegum, Sahila
    Fleisher, David
    Reddy, Vangimalla R.
    Sun, Wenguang
    Ray, Chittaranjan
    Timlin, Dennis
    Malakar, Arindam
    IEEE ACCESS, 2024, 12 : 101273 - 101288
  • [48] Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection
    Luo, Menghua
    Wang, Ke
    Cai, Zhiping
    Liu, Anfeng
    Li, Yangyang
    Cheang, Chak Fong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 58 (01): : 15 - 26
  • [49] Improving machine learning-based bitewing segmentation with synthetic data
    Tolstaya, Ekaterina
    Tichy, Antonin
    Paris, Sebastian
    Schwendicke, Falk
    JOURNAL OF DENTISTRY, 2025, 156
  • [50] Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets
    Shuryak, Igor
    PLOS ONE, 2017, 12 (01):