Sample size determination for training set optimization in genomic prediction

被引:9
|
作者
Wu, Po-Ya [1 ,2 ]
Ou, Jen-Hsiang [1 ,3 ]
Liao, Chen-Tuo [1 ]
机构
[1] Natl Taiwan Univ, Dept Agron, Taipei, Taiwan
[2] Heinrich Heine Univ, Inst Quant Genet & Genom Plants, Dusseldorf, Germany
[3] Uppsala Univ, Dept Med Biochem & Microbiol, Uppsala, Sweden
关键词
CALIBRATION SET; LINEAR-MODELS; SELECTION; ACCURACY; INDIVIDUALS; REGRESSION; PRECISION;
D O I
10.1007/s00122-023-04254-9
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a statistical prediction model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals within a breeding population. Setting the sample size of the training set usually takes into account time and space constraints that are inevitable in an agricultural experiment. However, the determination of the sample size remains an unresolved issue for a GP study. By applying the logistic growth curve to identify prediction accuracy for the GEBVs and the training set size, a practical approach was developed to determine a cost-effective optimal training set for a given genome dataset with known genotypic data. Three real genome datasets were used to illustrate the proposed approach. An R function is provided to facilitate widespread application of this approach to sample size determination, which can help breeders to identify a set of genotypes with an economical sample size for selective phenotyping.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Sample size determination for training set optimization in genomic prediction
    Po-Ya Wu
    Jen-Hsiang Ou
    Chen-Tuo Liao
    Theoretical and Applied Genetics, 2023, 136
  • [2] Training set optimization of genomic prediction by means of EthAcc
    Mangin, Brigitte
    Rincent, Renaud
    Rabier, Charles-Elie
    Moreau, Laurence
    Goudemand-Dugue, Ellen
    PLOS ONE, 2019, 14 (02):
  • [3] Genomic prediction and training set optimization in a structured Mediterranean oat population
    Simon Rio
    Luis Gallego-Sánchez
    Gracia Montilla-Bascón
    Francisco J. Canales
    Julio Isidro y Sánchez
    Elena Prats
    Theoretical and Applied Genetics, 2021, 134 : 3595 - 3609
  • [4] Genomic prediction and training set optimization in a structured Mediterranean oat population
    Rio, Simon
    Gallego-Sanchez, Luis
    Montilla-Bascon, Gracia
    Canales, Francisco J.
    Sanchez, Julio Isidro y
    Prats, Elena
    THEORETICAL AND APPLIED GENETICS, 2021, 134 (11) : 3595 - 3609
  • [5] Training set determination for genomic selection
    Ou, Jen-Hsiang
    Liao, Chen-Tuo
    THEORETICAL AND APPLIED GENETICS, 2019, 132 (10) : 2781 - 2792
  • [6] Training set determination for genomic selection
    Jen-Hsiang Ou
    Chen-Tuo Liao
    Theoretical and Applied Genetics, 2019, 132 : 2781 - 2792
  • [7] A Function Accounting for Training Set Size and Marker Density to Model the Average Accuracy of Genomic Prediction
    Erbe, Malena
    Gredler, Birgit
    Seefried, Franz Reinhold
    Bapst, Beat
    Simianer, Henner
    PLOS ONE, 2013, 8 (12):
  • [8] Training set design in genomic prediction with multiple biparental families
    Zhu, Xintian
    Leiser, Willmar L.
    Hahn, Volker
    Wuerschum, Tobias
    PLANT GENOME, 2021, 14 (03):
  • [9] Training set optimization under population structure in genomic selection
    Julio Isidro
    Jean-Luc Jannink
    Deniz Akdemir
    Jesse Poland
    Nicolas Heslot
    Mark E. Sorrells
    Theoretical and Applied Genetics, 2015, 128 : 145 - 158
  • [10] Training set optimization under population structure in genomic selection
    Isidro, Julio
    Jannink, Jean-Luc
    Akdemir, Deniz
    Poland, Jesse
    Heslot, Nicolas
    Sorrells, Mark E.
    THEORETICAL AND APPLIED GENETICS, 2015, 128 (01) : 145 - 158