Reference-Free Validation of Short Read Data

被引:20
|
作者
Schroeder, Jan [1 ,2 ]
Bailey, James [1 ,2 ]
Conway, Thomas [2 ]
Zobel, Justin [1 ,2 ]
机构
[1] Univ Melbourne, Dept Comp Sci & Software Engn, Parkville, Vic 3052, Australia
[2] NICTA Victoria Res Lab, Parkville, Vic, Australia
来源
PLOS ONE | 2010年 / 5卷 / 09期
基金
澳大利亚研究理事会;
关键词
GENOME;
D O I
10.1371/journal.pone.0012681
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked. Results: We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others. Conclusions: The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [31] Bayesian Changepoint Modelling for Reference-Free Damage Detection with Acoustic Emission Data
    Scott, Ru E.
    Jones, Matthew R.
    Rogers, Timothy J.
    EUROPEAN WORKSHOP ON STRUCTURAL HEALTH MONITORING (EWSHM 2022), VOL 3, 2023, : 462 - 471
  • [32] NeatFreq: reference-free data reduction and coverage normalization for De Novosequence assembly
    Jamison M McCorrison
    Pratap Venepally
    Indresh Singh
    Derrick E Fouts
    Roger S Lasken
    Barbara A Methé
    BMC Bioinformatics, 15
  • [33] Reference-free deconvolution of DNA methylation data and mediation by cell composition effects
    Houseman, E. Andres
    Kile, Molly L.
    Christiani, David C.
    Ince, Tan A.
    Kelsey, Karl T.
    Marsit, Carmen J.
    BMC BIOINFORMATICS, 2016, 17
  • [34] Reference-Free Alignment and Sorting of Single-Molecule Force Spectroscopy Data
    Bosshart, Patrick D.
    Frederix, Patrick L. T. M.
    Engel, Andreas
    BIOPHYSICAL JOURNAL, 2012, 102 (09) : 2202 - 2211
  • [35] Reference-free deconvolution of DNA methylation data and mediation by cell composition effects
    E. Andres Houseman
    Molly L. Kile
    David C. Christiani
    Tan A. Ince
    Karl T. Kelsey
    Carmen J. Marsit
    BMC Bioinformatics, 17
  • [36] A reference-free MEAM potential for α-Fe and γ-Fe
    Slooter, Rutger J.
    Sluiter, Marcel H. F.
    Kranendonk, Winfried G. T.
    Bos, Cornelis
    JOURNAL OF PHYSICS-CONDENSED MATTER, 2022, 34 (50)
  • [37] Fast and accurate reference-free alignment of subtomograms
    Chen, Yuxiang
    Pfeffer, Stefan
    Hrabe, Thomas
    Schuller, Jan Michael
    Foerster, Friedrich
    JOURNAL OF STRUCTURAL BIOLOGY, 2013, 182 (03) : 235 - 245
  • [38] Reference-free structural variant detection in microbiomes via long-read co-assembly graphs
    Curry, Kristen D.
    Yu, Feiqiao Brian
    Vance, Summer E.
    Segarra, Santiago
    Bhaya, Devaki
    Chikhi, Rayan
    Rocha, Eduardo P. C.
    Treangen, Todd J.
    BIOINFORMATICS, 2024, 40 : i58 - i67
  • [39] Reference-Free Deterministic Calibration of Pipelined ADC
    Oshima, Takashi
    Yamawaki, Taizo
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (02) : 665 - 675
  • [40] A compact reference-free holographic image sensor
    Lee, KyeoReh
    Park, YongKeun
    2017 OPTO-ELECTRONICS AND COMMUNICATIONS CONFERENCE (OECC) AND PHOTONICS GLOBAL CONFERENCE (PGC), 2017,