HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data

被引:16
|
作者
Hochreiter, Sepp [1 ]
机构
[1] Johannes Kepler Univ Linz, Inst Bioinformat, A-4040 Linz, Austria
关键词
COPY NUMBER VARIATIONS; LINKAGE DISEQUILIBRIUM; BY-DESCENT; HAPLOTYPE BLOCKS; GENOME; ASSOCIATION; COALESCENT; EXPRESSION; SELECTION; HISTORY;
D O I
10.1093/nar/gkt1013
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority-152 000 IBD segments-are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD.
引用
收藏
页数:21
相关论文
共 11 条
  • [1] Identifying rare variants inconsistent with identity-by-descent in population-scale whole-genome sequencing data
    Johnson, Kelsey E.
    Adams, Christopher J.
    Voight, Benjamin F.
    METHODS IN ECOLOGY AND EVOLUTION, 2022, 13 (11): : 2429 - 2442
  • [2] A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data
    Zhou, Ying
    Browning, Sharon R.
    Browning, Brian L.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2020, 106 (04) : 426 - 437
  • [3] Rare Risk Variants Identification by Identity-by-Descent Mapping and Whole-Exome Sequencing Implicates Neuronal Development Pathways in Schizophrenia and Bipolar Disorder
    C. Salvoro
    S. Bortoluzzi
    A. Coppe
    G. Valle
    E. Feltrin
    M. L. Mostacciuolo
    G. Vazza
    Molecular Neurobiology, 2018, 55 : 7366 - 7376
  • [4] Rare Risk Variants Identification by Identity-by-Descent Mapping and Whole-Exome Sequencing Implicates Neuronal Development Pathways in Schizophrenia and Bipolar Disorder
    Salvoro, C.
    Bortoluzzi, S.
    Coppe, A.
    Valle, G.
    Feltrin, E.
    Mostacciuolo, M. L.
    Vazza, G.
    MOLECULAR NEUROBIOLOGY, 2018, 55 (09) : 7366 - 7376
  • [5] Population-based identity-by-descent mapping combined with exome sequencing to detect rare risk variants for schizophrenia
    Harold, Denise
    Connolly, Siobhan
    Riley, Brien P.
    Kendler, Kenneth S.
    McCarthy, Shane E.
    McCombie, William R.
    Richards, Alex
    Owen, Michael J.
    O'Donovan, Michael C.
    Walters, James
    Donnelly, Peter
    Bates, Lesley
    Barroso, Ines
    Blackwell, Jenefer M.
    Bramon, Elvira
    Brown, Matthew A.
    Casas, Juan P.
    Corvin, Aiden
    Deloukas, Panos
    Duncanson, Audrey
    Jankowski, Janusz
    Markus, Hugh S.
    Mathew, Christopher G.
    Palmer, Colin N. A.
    Plomin, Robert
    Rautanen, Anna
    Sawcer, Stephen J.
    Trembath, Richard C.
    Viswanathan, Ananth C.
    Wood, Nicholas W.
    Spencer, Chris C. A.
    Band, Gavin
    Bellenguez, Celine
    Freeman, Colin
    Hellenthal, Garrett
    Giannoulatou, Eleni
    Hopkins, Lucinda
    Pirinen, Matti
    Pearson, Richard
    Strange, Amy
    Su, Zhan
    Vukcevic, Damjan
    Langford, Cordelia
    Hunt, Sarah E.
    Edkins, Sarah
    Gwilliam, Rhian
    Blackburn, Hannah
    Bumpstead, Suzannah J.
    Dronov, Serge
    Gillman, Matthew
    AMERICAN JOURNAL OF MEDICAL GENETICS PART B-NEUROPSYCHIATRIC GENETICS, 2019, 180 (03) : 223 - 231
  • [6] Quality control issues and the identification of rare functional variants with next-generation sequencing data
    Hemmelmann, Claudia
    Daw, E. Warwick
    Wilson, Alexander F.
    GENETIC EPIDEMIOLOGY, 2011, 35 : S22 - S28
  • [7] Identification of pharmacogenetic variants from large scale next generation sequencing data in the Saudi population
    Goljan, Ewa
    Abouelhoda, Mohammed
    ElKalioby, Mohamed M.
    Jabaan, Amjad
    Alghithi, Nada
    Meyer, Brian F.
    Monies, Dorota
    PLOS ONE, 2022, 17 (01):
  • [8] Efficient identification of rare variants in large populations: deep re-sequencing the CRP locus in the CARDIA study
    Chen, Christina T. L.
    McDavid, Andrew N.
    Kahsai, Orsalem J.
    Zebari, Ahmad S.
    Carlson, Christopher S.
    NUCLEIC ACIDS RESEARCH, 2013, 41 (07)
  • [9] Identification of very rare clinically actionable KRAS variants in colorectal cancer patients using a comprehensive large gene panel.
    Weeraratne, Shyamal Dilhan
    Tse, Julie Y.
    Pantazi, Angeliki
    Jiang, Jane
    Lvova, Maria
    Ring, Jennifer E.
    Vuzman, Dana
    Lyle, Stephen
    Russell, Meaghan
    JOURNAL OF CLINICAL ONCOLOGY, 2018, 36 (15)
  • [10] Whole exome sequencing identifies two novel extremely rare candidate variants associated with the short QT syndrome in two large pedigrees
    Kovacs, B.
    Graf, U.
    Magyar, I.
    Baehr, L.
    Maspoli, A.
    Medeiros-Domingo, A.
    Duru, F.
    Neubauer, J.
    Haas, C.
    Berger, W.
    Saguner, A.
    SWISS MEDICAL WEEKLY, 2022, 152 : 52S - 52S