HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph human mtDNA haplogroup inference

被引:6
|
作者
Rubin, Joshua Daniel [1 ]
Vogel, Nicola Alexandra [1 ]
Gopalakrishnan, Shyam [2 ]
Sackett, Peter Wad [1 ]
Renaud, Gabriel [1 ]
机构
[1] Tech Univ Denmark, Dept Hlth Technol, Lyngby, Denmark
[2] Univ Copenhagen, Sect Hologen, Copenhagen, Denmark
关键词
MITOCHONDRIAL-DNA HAPLOGROUPS; SEQUENCE; ASSOCIATION; GENOME; RISK;
D O I
10.1371/journal.pcbi.1011148
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summaryPangenome graphs are powerful and relatively nascent data structures for representing an entire collection of genomic sequences and their homology. Here we present HaploCart, a tool which leverages the power of pangenomics, in conjunction with maximum-likelihood estimation, to improve human mtDNA haplotype inference on single-source samples (i.e. the sample is not a mixture of multiple contributors, be they human or contaminant). In this context, mapping to many reference genomes at once vastly reduces the Eurocentric bias inherent in contemporary methods, and also improves haplotyping performance at low coverage depths. We show that HaploCart is far more accurate than competing programs on simulated and empirical datasets, and reports clade-level posterior probabilities that accurately reflect confidence in our phylogenetic assignments. Our work can easily be generalized to other haploid markers and suggests that pangenome-based approaches combined with Bayesian methods show promise for improving inference and mitigating ethnicity-related bias in a large class of bioinformatics problems involving sequencing data. Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment.
引用
收藏
页数:27
相关论文
共 47 条
  • [21] Adaptive-Neuro Fuzzy Inference System for Human Posture Classification Using a Simplified Shock Graph
    Shahbudin, S.
    Hussain, A.
    El-Shafie, Ahmed
    Tahir, N. M.
    Samad, S. A.
    VISUAL INFORMATICS: BRIDGING RESEARCH AND PRACTICE, 2009, 5857 : 585 - +
  • [22] Discovering Functional DNA Elements Using Population Genomic Information: A Proof of Concept Using Human mtDNA
    Schrider, Daniel R.
    Kern, Andrew D.
    GENOME BIOLOGY AND EVOLUTION, 2014, 6 (07): : 1542 - 1548
  • [23] Yleaf: Software for Human Y-Chromosomal Haplogroup Inference from Next-Generation Sequencing Data (vol 35, pg 1291, 2018)
    Ralf, Arwin
    Gonzalez, Diego Montiel
    Zhong, Kaiyin
    Kayser, Manfred
    MOLECULAR BIOLOGY AND EVOLUTION, 2018, 35 (07) : 1820 - 1820
  • [24] Analysis of mtDNA HVRII in several human populations using an immobilised SSO probe hybridisation assay
    Comas, D
    Reynolds, R
    Sajantila, A
    EUROPEAN JOURNAL OF HUMAN GENETICS, 1999, 7 (04) : 459 - 468
  • [25] Analysis of mtDNA HVRII in several human populations using an immobilised SSO probe hybridisation assay
    David Comas
    Rebecca Reynolds
    Antti Sajantila
    European Journal of Human Genetics, 1999, 7 : 459 - 468
  • [26] Testing migration patterns and estimating founding population size in Polynesia by using human mtDNA sequences
    Murray-McIntosh, RP
    Scrimshaw, BJ
    Hatfield, PJ
    Penny, D
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (15) : 9047 - 9052
  • [27] Assessing heteroplasmic variant drift in the mtDNA control region of human hairs using an MPS approach
    Gallimore, Jamie M.
    McElhoe, Jennifer A.
    Holland, Mitchell M.
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2018, 32 : 7 - 17
  • [28] Quantification of trace amounts of human and non-human mitochondrial DNA (mtDNA) using SYBR Green and real time PCR
    Tobe, Shanan S.
    Linacre, Adrian
    FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES, 2008, 1 (01) : 71 - 73
  • [29] Defining the pathogenesis of human mtDNA mutations using a yeast model: The case of T8851C
    Kucharczyk, Roza
    Giraud, Marie-France
    Brethes, Daniel
    Wysocka-Kapcinska, Monica
    Ezkurdia, Nahia
    Salin, Benedicte
    Velours, Jean
    Camougrand, Nadine
    Haraux, Francis
    di Rago, Jean-Paul
    INTERNATIONAL JOURNAL OF BIOCHEMISTRY & CELL BIOLOGY, 2013, 45 (01): : 130 - 140
  • [30] The application of machine learning to predict genetic relatedness using human mtDNA hypervariable region I sequences
    Govender, Priyanka
    Fashoto, Stephen Gbenga
    Maharaj, Leah
    Adeleke, Matthew A.
    Mbunge, Elliot
    Olamijuwon, Jeremiah
    Akinnuwesi, Boluwaji
    Okpeku, Moses
    PLOS ONE, 2022, 17 (02):