Standardizing and Centralizing Datasets for Efficient Training of Agricultural Deep Learning Models

被引:4
|
作者
Joshi, Amogh [1 ,2 ,3 ]
Guevara, Dario [1 ,2 ,3 ]
Earles, Mason [1 ,2 ,3 ]
机构
[1] Univ Calif Davis, Dept Viticulture & Enol, Davis, CA 95616 USA
[2] Univ Calif Davis, Dept Biol & Agr Engn, Davis, CA 95616 USA
[3] Univ Calif Davis, AI Inst Next Generat Food Syst AIFS, Davis, CA 95616 USA
来源
PLANT PHENOMICS | 2023年 / 5卷
关键词
57;
D O I
10.34133/plantphenomics.0084
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
In recent years, deep learning models have become the standard for agricultural computer vision. Such models are typically fine-tuned to agricultural tasks using model weights that were originally fit to more general, non-agricultural datasets. This lack of agriculture-specific fine-tuning potentially increases training time and resource use, and decreases model performance, leading to an overall decrease in data efficiency. To overcome this limitation, we collect a wide range of existing public datasets for 3 distinct tasks, standardize them, and construct standard training and evaluation pipelines, providing us with a set of benchmarks and pretrained models. We then conduct a number of experiments using methods that are commonly used in deep learning tasks but unexplored in their domain-specific applications for agriculture. Our experiments guide us in developing a number of approaches to improve data efficiency when training agricultural deep learning models, without large-scale modifications to existing pipelines. Our results demonstrate that even slight training modifications, such as using agricultural pretrained model weights, or adopting specific spatial augmentations into data processing pipelines, can considerably boost model performance and result in shorter convergence time, saving training resources. Furthermore, we find that even models trained on low-quality annotations can produce comparable levels of performance to their high-quality equivalents, suggesting that datasets with poor annotations can still be used for training, expanding the pool of currently available datasets. Our methods are broadly applicable throughout agricultural deep learning and present high potential for substantial data efficiency improvements.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] When small is too small? Training Deep Learning models in limited datasets.
    Valdes, G.
    Romero, M.
    Interian, Y.
    Solberg, T.
    [J]. RADIOTHERAPY AND ONCOLOGY, 2020, 152 : S825 - S825
  • [2] Deep Learning in Disease Diagnosis: Models and Datasets
    Saxena, Deeksha
    Siddiqui, Mohammed Haris
    Kumar, Rajnish
    [J]. CURRENT BIOINFORMATICS, 2021, 16 (05) : 632 - 643
  • [3] Efficient Training of Deep Learning Models Through Improved Adaptive Sampling
    Avalos-Lopez, Jorge Ivan
    Rojas-Dominguez, Alfonso
    Ornelas-Rodriguez, Manuel
    Carpio, Martin
    Valdez, S. Ivvan
    [J]. PATTERN RECOGNITION (MCPR 2021), 2021, 12725 : 141 - 152
  • [4] Deep learning in retrosynthesis planning: datasets, models and tools
    Dong, Jingxin
    Zhao, Mingyi
    Liu, Yuansheng
    Su, Yansen
    Zeng, Xiangxiang
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [5] Benchmarking deep learning models on large healthcare datasets
    Purushotham, Sanjay
    Meng, Chuizheng
    Che, Zhengping
    Liu, Yan
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 83 : 112 - 134
  • [6] Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
    Jacobs, Sam Ade
    Van Essen, Brian
    Hysom, David
    Yeom, Jae-Seung
    Moon, Tim
    Anirudh, Rushil
    Thiagaranjan, Jayaraman J.
    Liu, Shusen
    Bremer, Peer-Timo
    Gaffney, Jim
    Benson, Tom
    Robinson, Peter
    Peterson, Luc
    Spears, Brian
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 115 - 124
  • [7] Deep Learning Models Compression for Agricultural Plants
    Fountsop, Arnauld Nzegha
    Ebongue Kedieng Fendji, Jean Louis
    Atemkeng, Marcellin
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 19
  • [8] Training deep retrieval models with noisy datasets: Bag exponential loss
    Martinez-Cortes, Tomas
    Gonzalez-Diaz, Ivan
    Diaz-de-Maria, Fernando
    [J]. PATTERN RECOGNITION, 2021, 112
  • [9] Continuous Training and Deployment of Deep Learning Models
    Prapas, Ioannis
    Derakhshan, Behrouz
    Mahdiraji, Alireza Rezaei
    Markl, Volker
    [J]. Datenbank-Spektrum, 2021, 21 (03) : 203 - 212
  • [10] Towards Training Reproducible Deep Learning Models
    Chen, Boyuan
    Wen, Mingzhi
    Shi, Yong
    Lin, Dayi
    Rajbahadur, Gopi Krishnan
    Jiang, Zhen Ming
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2202 - 2214