Applications of random forest feature selection for fine-scale genetic population assignment

被引:88
|
作者
Sylvester, Emma V. A. [1 ]
Bentzen, Paul [2 ]
Bradbury, Ian R. [3 ]
Clement, Marie [4 ,5 ]
Pearce, Jon [6 ]
Horne, John [2 ]
Beiko, Robert G. [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS, Canada
[2] Dalhousie Univ, Dept Biol, Marine Gene Probe Lab, Halifax, NS, Canada
[3] Fisheries & Oceans Canada, St John, NF, Canada
[4] Mem Univ Newfoundland, Ctr Fisheries Ecosyst Res, Fisheries & Marine Inst, St John, NF, Canada
[5] Mem Univ Newfoundland, Labrador Inst, Happy Valley Goose Bay, NF, Canada
[6] Northern SE Reg Aquaculture Assoc, Hidden Falls Hatchery, Sitka, AK USA
来源
EVOLUTIONARY APPLICATIONS | 2018年 / 11卷 / 02期
基金
加拿大自然科学与工程研究理事会;
关键词
conservation genetics; fisheries management; individual assignment; random forest; SNP selection; CHINOOK SALMON; ATLANTIC SALMON; MICROSATELLITES; DISCRIMINATION; CONSERVATION; HABITAT; LOCI;
D O I
10.1111/eva.12524
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genetic population assignment used to inform wildlife management and conservation efforts requires panels of highly informative genetic markers and sensitive assignment tests. We explored the utility of machine learning algorithms (random forest, regularized random forest and guided regularized random forest) compared with F-ST ranking for selection of single nucleotide polymorphisms (SNP) for fine scale population assignment. We applied these methods to an unpublished SNP data set for Atlantic salmon (Salmo salar) and a published SNP data set for Alaskan Chinook salmon (Oncorhynchus tshawytscha). In each species, we identified the minimum panel size required to obtain a self assignment accuracy of at least 90% using each method to create panels of 50-700 markers Panels of SNPs identified using random forest based methods performed up to 7.8 and 11.2 percentage points better than FST selected panels of similar size for the Atlantic salmon and Chinook salmon data, respectively. Self assignment accuracy >= 90% was obtained with panels of 670 and 384 SNPs for each data set, respectively, a level of accuracy never reached for these species using F-ST selected panels. Our results demonstrate a role for machine learning approaches in marker selection across large genomic data sets to improve assignment for management and conservation of exploited populations.
引用
收藏
页码:153 / 165
页数:13
相关论文
共 50 条
  • [1] Comparison of three statistical approaches for feature selection for fine-scale genetic population assignment in four pig breeds
    Ichrak Hayah
    Mouna Ababou
    Sara Botti
    Bouabid Badaoui
    [J]. Tropical Animal Health and Production, 2021, 53
  • [2] Comparison of three statistical approaches for feature selection for fine-scale genetic population assignment in four pig breeds
    Hayah, Ichrak
    Ababou, Mouna
    Botti, Sara
    Badaoui, Bouabid
    [J]. TROPICAL ANIMAL HEALTH AND PRODUCTION, 2021, 53 (03)
  • [3] The fine-scale genetic structure of the British population
    Leslie, Stephen
    Winney, Bruce
    Hellenthal, Garrett
    Davison, Dan
    Boumertit, Abdelhamid
    Day, Tammy
    Hutnik, Katarzyna
    Royrvik, Ellen C.
    Cunliffe, Barry
    Lawson, Daniel J.
    Falush, Daniel
    Freeman, Colin
    Pirinen, Matti
    Myers, Simon
    Robinson, Mark
    Donnelly, Peter
    Bodmer, Walter
    [J]. NATURE, 2015, 519 (7543) : 309 - +
  • [4] A fine-scale genetic map of the Japanese population
    Takayama, Jun
    Makino, Satoshi
    Funayama, Takamitsu
    Ueki, Masao
    Narita, Akira
    Murakami, Keiko
    Orui, Masatsugu
    Ishikuro, Mami
    Obara, Taku
    Kuriyama, Shinichi
    Yamamoto, Masayuki
    Tamiya, Gen
    [J]. CLINICAL GENETICS, 2024, 106 (03) : 284 - 292
  • [5] The fine-scale genetic structure of the British population
    Stephen Leslie
    Bruce Winney
    Garrett Hellenthal
    Dan Davison
    Abdelhamid Boumertit
    Tammy Day
    Katarzyna Hutnik
    Ellen C. Royrvik
    Barry Cunliffe
    Daniel J. Lawson
    Daniel Falush
    Colin Freeman
    Matti Pirinen
    Simon Myers
    Mark Robinson
    Peter Donnelly
    Walter Bodmer
    [J]. Nature, 2015, 519 : 309 - 314
  • [6] Estimating and Interpreting Fine-Scale Gridded Population Using Random Forest Regression and Multisource Data
    Zhou, Yun
    Ma, Mingguo
    Shi, Kaifang
    Peng, Zhenyu
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (06)
  • [7] The fine-scale genetic structure and evolution of the Japanese population
    Takeuchi, Fumihiko
    Katsuya, Tomohiro
    Kimura, Ryosuke
    Nabika, Toru
    Isomura, Minoru
    Ohkubo, Takayoshi
    Tabara, Yasuharu
    Yamamoto, Ken
    Yokota, Mitsuhiro
    Liu, Xuanyao
    Saw, Woei-Yuh
    Mamatyusupu, Dolikun
    Yang, Wenjun
    Xu, Shuhua
    Teo, Yik-Ying
    Kato, Norihiro
    [J]. PLOS ONE, 2017, 12 (11):
  • [8] FINE-SCALE GENETIC-STRUCTURE OF A TURKEY OAK FOREST
    BERG, EE
    HAMRICK, JL
    [J]. EVOLUTION, 1995, 49 (01) : 110 - 120
  • [9] Towards fine-scale population stratification modeling based on kernel principal component analysis and random forest
    Zhang, Weiwen
    Cheng, Lianglun
    Huang, Guoheng
    [J]. GENES & GENOMICS, 2021, 43 (10) : 1143 - 1155
  • [10] Towards fine-scale population stratification modeling based on kernel principal component analysis and random forest
    Weiwen Zhang
    Lianglun Cheng
    Guoheng Huang
    [J]. Genes & Genomics, 2021, 43 : 1143 - 1155