Sample size determination for training set optimization in genomic prediction

被引:9
|
作者
Wu, Po-Ya [1 ,2 ]
Ou, Jen-Hsiang [1 ,3 ]
Liao, Chen-Tuo [1 ]
机构
[1] Natl Taiwan Univ, Dept Agron, Taipei, Taiwan
[2] Heinrich Heine Univ, Inst Quant Genet & Genom Plants, Dusseldorf, Germany
[3] Uppsala Univ, Dept Med Biochem & Microbiol, Uppsala, Sweden
关键词
CALIBRATION SET; LINEAR-MODELS; SELECTION; ACCURACY; INDIVIDUALS; REGRESSION; PRECISION;
D O I
10.1007/s00122-023-04254-9
中图分类号
S3 [农学(农艺学)];
学科分类号
0901 ;
摘要
Genomic prediction (GP) is a statistical method used to select quantitative traits in animal or plant breeding. For this purpose, a statistical prediction model is first built that uses phenotypic and genotypic data in a training set. The trained model is then used to predict genomic estimated breeding values (GEBVs) for individuals within a breeding population. Setting the sample size of the training set usually takes into account time and space constraints that are inevitable in an agricultural experiment. However, the determination of the sample size remains an unresolved issue for a GP study. By applying the logistic growth curve to identify prediction accuracy for the GEBVs and the training set size, a practical approach was developed to determine a cost-effective optimal training set for a given genome dataset with known genotypic data. Three real genome datasets were used to illustrate the proposed approach. An R function is provided to facilitate widespread application of this approach to sample size determination, which can help breeders to identify a set of genotypes with an economical sample size for selective phenotyping.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] DETERMINATION OF THE MINIMUM SAMPLE SIZE
    Bakaeva, O. A.
    MORDOVIA UNIVERSITY BULLETIN, 2010, 4 : 111 - 114
  • [42] On Bayesian sample size determination
    Nassar, M. M.
    Khamis, S. M.
    Radwan, S. S.
    JOURNAL OF APPLIED STATISTICS, 2011, 38 (05) : 1045 - 1054
  • [43] Sample size determination: A review
    Adcock, CJ
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 1997, 46 (02) : 261 - 283
  • [44] Optimization of training sets for genomic prediction of early-stage single crosses in maize
    Kadam, Dnyaneshwar C.
    Rodriguez, Oscar R.
    Lorenz, Aaron J.
    THEORETICAL AND APPLIED GENETICS, 2021, 134 (02) : 687 - 699
  • [45] Optimization of training sets for genomic prediction of early-stage single crosses in maize
    Dnyaneshwar C. Kadam
    Oscar R. Rodriguez
    Aaron J. Lorenz
    Theoretical and Applied Genetics, 2021, 134 : 687 - 699
  • [46] An Improved Training Sample Set Construction Method
    Shen, Li
    Zhai Jiaojiao
    2018 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION, IMAGE AND SIGNAL PROCESSING, 2019, 1169
  • [47] Sample size determination for EQ-5D-5L value set studies
    Mihir Gandhi
    Ying Xu
    Nan Luo
    Yin Bun Cheung
    Quality of Life Research, 2017, 26 : 3365 - 3376
  • [48] Sample size determination for EQ-5D-5L value set studies
    Gandhi, Mihir
    Xu, Ying
    Luo, Nan
    Cheung, Yin Bun
    QUALITY OF LIFE RESEARCH, 2017, 26 (12) : 3365 - 3376
  • [49] Sample Size and Reproducibility of Gene Set Analysis
    Maleki, Farhad
    Ovens, Katie
    McQuillan, Ian
    Kusalik, Anthony J.
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 122 - 129
  • [50] Training Population Optimization for Genomic Selection
    Berro, Ines
    Lado, Bettina
    Nalin, Rafael S.
    Quincke, Martin
    Gutierrez, Lucia
    PLANT GENOME, 2019, 12 (03):