HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph human mtDNA haplogroup inference

被引:6
|
作者
Rubin, Joshua Daniel [1 ]
Vogel, Nicola Alexandra [1 ]
Gopalakrishnan, Shyam [2 ]
Sackett, Peter Wad [1 ]
Renaud, Gabriel [1 ]
机构
[1] Tech Univ Denmark, Dept Hlth Technol, Lyngby, Denmark
[2] Univ Copenhagen, Sect Hologen, Copenhagen, Denmark
关键词
MITOCHONDRIAL-DNA HAPLOGROUPS; SEQUENCE; ASSOCIATION; GENOME; RISK;
D O I
10.1371/journal.pcbi.1011148
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summaryPangenome graphs are powerful and relatively nascent data structures for representing an entire collection of genomic sequences and their homology. Here we present HaploCart, a tool which leverages the power of pangenomics, in conjunction with maximum-likelihood estimation, to improve human mtDNA haplotype inference on single-source samples (i.e. the sample is not a mixture of multiple contributors, be they human or contaminant). In this context, mapping to many reference genomes at once vastly reduces the Eurocentric bias inherent in contemporary methods, and also improves haplotyping performance at low coverage depths. We show that HaploCart is far more accurate than competing programs on simulated and empirical datasets, and reports clade-level posterior probabilities that accurately reflect confidence in our phylogenetic assignments. Our work can easily be generalized to other haploid markers and suggests that pangenome-based approaches combined with Bayesian methods show promise for improving inference and mitigating ethnicity-related bias in a large class of bioinformatics problems involving sequencing data. Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome and perform inference based on the detected mutations to this reference. This approach biases haplogroup assignments towards the reference and prohibits accurate calculations of the uncertainty in assignment. We present HaploCart, a probabilistic mtDNA haplogroup classifier which uses a pangenomic reference graph framework together with principles of Bayesian inference. We demonstrate that our approach significantly outperforms available tools by being more robust to lower coverage or incomplete consensus sequences and producing phylogenetically-aware confidence scores that are unbiased towards any haplogroup. HaploCart is available both as a command-line tool and through a user-friendly web interface. The C++ program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a text file with the haplogroup assignments of the samples along with the level of confidence in the assignments. Our work considerably reduces the amount of data required to obtain a confident mitochondrial haplogroup assignment.
引用
收藏
页数:27
相关论文
共 47 条
  • [31] Gender Classification of Human Faces Using Inference through Contradictions
    Bai, Xue
    Cherkassky, Vladimir
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 746 - 750
  • [32] First identification of human remains using mtDNA sequence analysis in Genetic Laboratory of Royal Gendarmerie in Morocco
    El Ossmani, Hicham
    Gazzaz, Bouchaih
    El Harrak, Asmaa
    Boutayeb, Souad
    El Amri, Hamid
    FORENSIC SCIENCE INTERNATIONAL GENETICS SUPPLEMENT SERIES, 2009, 2 (01) : 271 - 272
  • [33] Automatic Human Emotion Classification in Web Document Using Fuzzy Inference System (FIS): Human Emotion Classification
    Shakeel, P. Mohamed
    Baskar, S.
    INTERNATIONAL JOURNAL OF TECHNOLOGY AND HUMAN INTERACTION, 2020, 16 (01) : 94 - 104
  • [34] Resolving a human identification case for the Rio de Janeiro Police with massively parallel sequencing of mtDNA using a proposed pipeline
    Bottino, C.
    Silva, R.
    Moura-Neto, R. S.
    GENETICS AND MOLECULAR RESEARCH, 2021, 20 (01):
  • [35] Doxorubicin-Induced Translocation of mtDNA into the Nuclear Genome of Human Lymphocytes Detected Using a Molecular-Cytogenetic Approach
    Harutyunyan, Tigran
    Al-Rikabi, Ahmed
    Sargsyan, Anzhela
    Hovhannisyan, Galina
    Aroutiounian, Rouben
    Liehr, Thomas
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2020, 21 (20) : 1 - 11
  • [36] In vitro modelling of mitochondrial disease using human induced pluripotent stem cell (hIPSC) derived myotubes harbouring mtDNA mutations
    O'Callaghan, B.
    Hanna, M. G.
    Morgan, J.
    Houlden, H.
    Madej, M.
    NEUROMUSCULAR DISORDERS, 2018, 28 : S32 - S32
  • [37] Classification of human Herpesviridae proteins using Domain-architecture Aware Inference of Orthologs (DAIO)
    Zmasek, Christian M.
    Knipe, David M.
    Pellett, Philip E.
    Scheuermann, Richard H.
    VIROLOGY, 2019, 529 : 29 - 42
  • [38] Interference of Co-Amplified Nuclear Mitochondrial DNA Sequences on the Determination of Human mtDNA Heteroplasmy by Using the SURVEYOR Nuclease and the WAVE HS System
    Yen, Hsiu-Chuan
    Li, Shiue-Li
    Hsu, Wei-Chien
    Tang, Petrus
    PLOS ONE, 2014, 9 (03):
  • [39] Massively parallel sequencing of human skeletal remains in Vietnam using the precision ID mtDNA control region panel on the Ion S5™ system
    May Thi Anh Ta
    Nam Ngoc Nguyen
    Duc Minh Tran
    Trang Hong Nguyen
    Tuan Anh Vu
    Dung Thi Le
    Phuong Thi Le
    Thu Thi Hong Do
    Ha Hoang
    Hoang Ha Chu
    INTERNATIONAL JOURNAL OF LEGAL MEDICINE, 2021, 135 (06) : 2285 - 2294
  • [40] Massively parallel sequencing of human skeletal remains in Vietnam using the precision ID mtDNA control region panel on the Ion S5™ system
    May Thi Anh Ta
    Nam Ngoc Nguyen
    Duc Minh Tran
    Trang Hong Nguyen
    Tuan Anh Vu
    Dung Thi Le
    Phuong Thi Le
    Thu Thi Hong Do
    Ha Hoang
    Hoang Ha Chu
    International Journal of Legal Medicine, 2021, 135 : 2285 - 2294