Efficient Genome-Wide TagSNP Selection Across Populations via the Linkage Disequilibrium Criterion

被引:7
|
作者
Liu, Lan [1 ,2 ]
Wu, Yonghui [1 ,2 ]
Lonardi, Stefano [1 ]
Jiang, Tao [1 ]
机构
[1] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92507 USA
[2] Google Inc, Mountain View, CA USA
关键词
genome-wide tagSNP selection; greedy algorithm; HapMap; Lagrangian relaxation; linkage disequilibrium; multiple populations; SINGLE-NUCLEOTIDE POLYMORPHISMS; HAPLOTYPE-TAGGING SNPS; SET; BLOCKS; ASSOCIATION; ALGORITHM; PATTERNS; MAP;
D O I
10.1089/cmb.2007.0228
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this article, we studied the tag single-nucleotide polymorphism (tagSNP) selection problem on multiple populations using the pairwise r(2) linkage disequilibrium criterion. We proposed a novel combinatorial optimization model for the tagSNP selection problem, called the minimum common tagSNP selection (MCTS) problem, and presented efficient solutions for MCTS. Our approach consists of the following three main steps: (i) partitioning the SNP markers into small disjoint components, (ii) applying some data reduction rules to simplify the problem, and (iii) applying either a fast greedy algorithm or a Lagrangian relaxation algorithm to solve the remaining (general) MCTS. These algorithms also provide lower bounds on tagging (i. e., the minimum number of tagSNPs needed). The lower bounds allow us to evaluate how far our solution is from the optimum. To the best of our knowledge, it is the first time the tagging lower bounds are discussed in the literature. We assessed the performance of our algorithms on real HapMap data for genome-wide tagging. The experiments demonstrated that our algorithms run 3-4 orders of magnitude faster than the existing single-population tagging programs such as FESTA, LD-Select, and the multiple-population tagging method MultiPop-TagSelect. Our method also greatly reduced the required tagSNPs compared with LD-Select on a single population and MultiPop-TagSelect on multiple populations. Moreover, the numbers of tagSNPs selected by our algorithms are almost optimal because they are very close to the corresponding lower bounds obtained by our method.
引用
收藏
页码:21 / 37
页数:17
相关论文
共 50 条
  • [21] Analysis of genome-wide linkage disequilibrium in the highly polyploid sugarcane
    Raboin, Louis-Marie
    Pauquet, Jerome
    Butterfield, Mike
    D'Hont, Angelique
    Glaszmann, Jean-Christophe
    THEORETICAL AND APPLIED GENETICS, 2008, 116 (05) : 701 - 714
  • [22] CHROMSCAN: genome-wide association using a linkage disequilibrium map
    Andrew Collins
    Winston Lau
    Journal of Human Genetics, 2008, 53 : 121 - 126
  • [23] The genome-wide distribution of background linkage disequilibrium in a population isolate
    Service, SK
    Ophoff, RA
    Freimer, NB
    HUMAN MOLECULAR GENETICS, 2001, 10 (05) : 545 - 551
  • [24] AUTOGSCAN: Powerful tools for automated genome-wide linkage and linkage disequilibrium analysis
    Hiekkalinna, T
    Terwilliger, JD
    Sammalisto, S
    Peltonen, L
    Perola, M
    TWIN RESEARCH AND HUMAN GENETICS, 2005, 8 (01) : 16 - 21
  • [25] Supervised learning-based tagSNP selection for genome-wide disease classifications
    Liu, Qingzhong
    Yang, Jack
    Chen, Zhongxue
    Yang, Mary Qu
    Sung, Andrew H.
    Huang, Xudong
    BMC GENOMICS, 2008, 9 (Suppl 1)
  • [26] Supervised learning-based tagSNP selection for genome-wide disease classifications
    Qingzhong Liu
    Jack Yang
    Zhongxue Chen
    Mary Qu Yang
    Andrew H Sung
    Xudong Huang
    BMC Genomics, 9
  • [27] Short communication: Characterization of the genome-wide linkage disequilibrium in 2 divergent selection lines of dairy cows
    Banos, G.
    Coffey, M. P.
    JOURNAL OF DAIRY SCIENCE, 2010, 93 (06) : 2775 - 2778
  • [28] Nonlinear analysis of time series in genome-wide linkage disequilibrium data
    Hernandez-Lemus, Enrique
    Estrada-Gil, Jesus K.
    Silva-Zolezzi, Irma
    Fernandez-Lopez, J. Carlos
    Hidalgo-Miranda, Alfredo
    Jimenez-Sanchez, Gerardo
    BIOLOGICAL PHYSICS, 2008, 978 : 34 - +
  • [29] Genome-wide linkage disequilibrium analysis in bread wheat and durum wheat
    Somers, Daryl J.
    Banks, Travis
    DePauw, Ron
    Fox, Stephen
    Clarke, John
    Pozniak, Curtis
    McCartney, Curt
    GENOME, 2007, 50 (06) : 557 - 567
  • [30] Robust Tests in Genome-Wide Scans under Incomplete Linkage Disequilibrium
    Zheng, Gang
    Joo, Jungnam
    Zaykin, Dmitri
    Wu, Colin
    Geller, Nancy
    STATISTICAL SCIENCE, 2009, 24 (04) : 503 - 516