Error and Error Mitigation in Low-Coverage Genome Assemblies

被引:30
|
作者
Hubisz, Melissa J. [1 ]
Lin, Michael F. [2 ,3 ]
Kellis, Manolis [2 ,3 ,4 ]
Siepel, Adam [1 ,5 ]
机构
[1] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14853 USA
[2] MIT, Broad Inst, Cambridge, MA 02139 USA
[3] Harvard Univ, Cambridge, MA 02138 USA
[4] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[5] Cornell Univ, Cornell Ctr Comparat & Populat Genom, Ithaca, NY USA
来源
PLOS ONE | 2011年 / 6卷 / 02期
关键词
DNA-SEQUENCES; ACCURACY; IDENTIFICATION; ALIGNMENTS; ARACHNE; MOUSE; TREES;
D O I
10.1371/journal.pone.0017034
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to similar to 2x coverage. Here we examine the extent of sequencing error in these 2x assemblies, and its potential impact in downstream analyses. By comparing 2x assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of overcorrection, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Extremely low-coverage sequencing and imputation increases power for genome-wide association studies
    Bogdan Pasaniuc
    Nadin Rohland
    Paul J McLaren
    Kiran Garimella
    Noah Zaitlen
    Heng Li
    Namrata Gupta
    Benjamin M Neale
    Mark J Daly
    Pamela Sklar
    Patrick F Sullivan
    Sarah Bergen
    Jennifer L Moran
    Christina M Hultman
    Paul Lichtenstein
    Patrik Magnusson
    Shaun M Purcell
    David W Haas
    Liming Liang
    Shamil Sunyaev
    Nick Patterson
    Paul I W de Bakker
    David Reich
    Alkes L Price
    Nature Genetics, 2012, 44 : 631 - 635
  • [42] Automated quantum error mitigation based on probabilistic error reduction
    McDonough, Benjamin
    Mari, Andrea
    Shammah, Nathan
    Stemen, Nathaniel T.
    Wahl, Misty
    Zeng, William J.
    Orth, Peter P.
    2022 IEEE/ACM THIRD INTERNATIONAL WORKSHOP ON QUANTUM COMPUTING SOFTWARE (QCS), 2022, : 83 - 93
  • [43] LOW-COVERAGE POLYOMINO ADSORPTION-ISOTHERM
    RYBOLT, TR
    JOURNAL OF COLLOID AND INTERFACE SCIENCE, 1985, 107 (02) : 547 - 552
  • [44] Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
    Rustagi, Navin
    Zhou, Anbo
    Watkins, W. Scott
    Gedvilaite, Erika
    Wang, Shuoguo
    Ramesh, Naveen
    Muzny, Donna
    Gibbs, Richard A.
    Jorde, Lynn B.
    Yu, Fuli
    Xing, Jinchuan
    BMC GENOMICS, 2017, 18
  • [45] Extremely low-coverage whole genome sequencing in South Asians captures population genomics information
    Navin Rustagi
    Anbo Zhou
    W. Scott Watkins
    Erika Gedvilaite
    Shuoguo Wang
    Naveen Ramesh
    Donna Muzny
    Richard A. Gibbs
    Lynn B. Jorde
    Fuli Yu
    Jinchuan Xing
    BMC Genomics, 18
  • [46] Quantum Error Mitigation: A Review
    Zhang Y.
    Yuan X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2021, 58 (09): : 1843 - 1855
  • [47] Superposed Quantum Error Mitigation
    Miguel-Ramiro, Jorge
    Shi, Zheng
    Dellantonio, Luca
    Chan, Albie
    Muschik, Christine A.
    Dur, Wolfgang
    PHYSICAL REVIEW LETTERS, 2023, 131 (23)
  • [48] Design for soft error mitigation
    Nicolaidis, M
    IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, 2005, 5 (03) : 405 - 418
  • [49] Modelling for Quantum Error Mitigation
    Weber, Tom
    Riebisch, Matthias
    Borras, Kerstin
    Jansen, Karl
    Kruecker, Dirk
    2021 IEEE 18TH INTERNATIONAL CONFERENCE ON SOFTWARE ARCHITECTURE COMPANION (ICSA-C), 2021, : 102 - 105
  • [50] Quantum annealing with error mitigation
    Shingu, Yuta
    Nikuni, Tetsuro
    Kawabata, Shiro
    Matsuzaki, Yuichiro
    PHYSICAL REVIEW A, 2024, 109 (04)