A method for positive forensic identification of samples from extremely low-coverage sequence data

被引:13
|
作者
Vohr, Samuel H. [1 ]
Najar, Carlos Fernando Buen Abad [2 ]
Shapiro, Beth [3 ]
Green, Richard E. [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Santa Cruz, CA 95064 USA
[2] Univ Nacl Autonoma Mexico, Fac Ciencias, Mexico City 04510, DF, Mexico
[3] Univ Calif Santa Cruz, Dept Ecol & Evolutionary Biol, Santa Cruz, CA 95064 USA
来源
BMC GENOMICS | 2015年 / 16卷
关键词
Forensics; Ancient DNA; Genomics; GENOME SEQUENCE; DNA; ANCIENT; ENRICHMENT; HAPLOTYPE;
D O I
10.1186/s12864-015-2241-6
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Determining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited. This is often the case for ancient, historic, and forensic samples. The most widely used approaches rely on amplification of a defined panel of multi-allelic markers and comparison to similar data from other samples. When the amount retrievable DNA is low these approaches fail. Results: We describe a new method for assessing whether shotgun DNA sequence data from two samples are consistent with originating from the same or different individuals. Our approach makes use of the large catalogs of single nucleotide polymorphism (SNP) markers to maximize the chances of observing potentially discriminating alleles. We further reduce the amount of data required by taking advantage of patterns of linkage disequilibrium modeled by a reference panel of haplotypes to indirectly compare observations at pairs of linked SNPs. Using both coalescent simulations and real sequencing data from modern and ancient sources, we show that this approach is robust with respect to the reference panel and has power to detect positive identity from DNA libraries with less than 1 % random and non-overlapping genome coverage in each sample. Conclusion: We present a powerful new approach that can determine whether DNA from two samples originated from the same individual even when only minute quantities of DNA are recoverable from each.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples
    Snyder-Mackler, Noah
    Majoros, William H.
    Yuan, Michael L.
    Shaver, Amanda O.
    Gordon, Jacob B.
    Kopp, Gisela H.
    Schlebusch, Stephen A.
    Wall, Jeffrey D.
    Alberts, Susan C.
    Mukherjee, Sayan
    Zhou, Xiang
    Tung, Jenny
    GENETICS, 2016, 203 (02) : 699 - +
  • [32] Novel Methods to Optimize Genotypic Imputation for Low-Coverage, Next-Generation Sequence Data in Crop Plants
    Swarts, Kelly
    Li, Huihui
    Navarro, J. Alberto Romero
    An, Dong
    Romay, Maria Cinta
    Hearne, Sarah
    Acharya, Charlotte
    Glaubitz, Jeffrey C.
    Mitchell, Sharon
    Elshire, Robert J.
    Buckler, Edward S.
    Bradbury, Peter J.
    PLANT GENOME, 2014, 7 (03):
  • [33] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
    Ignacio Lucas-Lledo, Jose
    Vicente-Salvador, David
    Aguado, Cristina
    Caceres, Mario
    BMC BIOINFORMATICS, 2014, 15
  • [34] Population genetic analysis of bi-allelic structural variants from low-coverage sequence data with an expectation-maximization algorithm
    José Ignacio Lucas-Lledó
    David Vicente-Salvador
    Cristina Aguado
    Mario Cáceres
    BMC Bioinformatics, 15
  • [35] Improved computations for relationship inference using low-coverage sequencing data
    Mostad, Petter
    Tillmar, Andreas
    Kling, Daniel
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [36] Extremely low-coverage sequencing and imputation increases power for genome-wide association studies
    Bogdan Pasaniuc
    Nadin Rohland
    Paul J McLaren
    Kiran Garimella
    Noah Zaitlen
    Heng Li
    Namrata Gupta
    Benjamin M Neale
    Mark J Daly
    Pamela Sklar
    Patrick F Sullivan
    Sarah Bergen
    Jennifer L Moran
    Christina M Hultman
    Paul Lichtenstein
    Patrik Magnusson
    Shaun M Purcell
    David W Haas
    Liming Liang
    Shamil Sunyaev
    Nick Patterson
    Paul I W de Bakker
    David Reich
    Alkes L Price
    Nature Genetics, 2012, 44 : 631 - 635
  • [37] Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
    Rustagi, Navin
    Zhou, Anbo
    Watkins, W. Scott
    Gedvilaite, Erika
    Wang, Shuoguo
    Ramesh, Naveen
    Muzny, Donna
    Gibbs, Richard A.
    Jorde, Lynn B.
    Yu, Fuli
    Xing, Jinchuan
    BMC GENOMICS, 2017, 18
  • [38] Improved computations for relationship inference using low-coverage sequencing data
    Petter Mostad
    Andreas Tillmar
    Daniel Kling
    BMC Bioinformatics, 24
  • [39] Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
    Navin Rustagi
    Anbo Zhou
    W. Scott Watkins
    Erika Gedvilaite
    Shuoguo Wang
    Naveen Ramesh
    Donna Muzny
    Richard A. Gibbs
    Lynn B. Jorde
    Fuli Yu
    Jinchuan Xing
    BMC Genomics, 18
  • [40] Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes
    Rubinacci, Simone
    Hofmeister, Robin J.
    da Mota, Barbara Sousa
    Delaneau, Olivier
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 50 - 50