Joining Illumina paired-end reads for classifying phylogenetic marker sequences

被引:19
|
作者
Liu, Tsunglin [1 ]
Chen, Chen-Yu [1 ]
Chen-Deng, An [1 ]
Chen, Yi-Lin [2 ]
Wang, Jiu-Yao [3 ,4 ]
Hou, Yung-, I [3 ]
Lin, Min-Ching [2 ]
机构
[1] Natl Cheng Kung Univ, Dept Biotechnol & Bioind Sci, Tainan 701, Taiwan
[2] Natl Cheng Kung Univ Hosp, Dept Pathol, Mol Diagnost Lab, Tainan, Taiwan
[3] Natl Cheng Kung Univ, Coll Med, Ctr Allergy & Clin Immunol Res, Tainan, Taiwan
[4] Natl Cheng Kung Univ, Coll Med, Dept Pediat, Tainan, Taiwan
关键词
Metagenomics; 16S; Illumina paired-end; Taxonomy annotation; Read joining; CLASSIFICATION; METAGENOMICS;
D O I
10.1186/s12859-020-3445-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are limited, one can simply use only first reads for taxonomy annotation, but that wastes information in the second reads. Presumably, including second reads should improve taxonomy annotation. However, a rigorous investigation of how best to do this and how much can be gained has not been reported. Results We evaluated two methods of joining as opposed to merging PE reads into single reads for taxonomy annotation using simulated data with sequencing errors. Our rigorous evaluation involved several top classifiers (RDP classifier, SINTAX, and two alignment-based methods) and realistic benchmark datasets. For most classifiers, read joining ameliorated the impact of sequencing errors and improved the accuracy of taxonomy predictions. For alignment-based top-hit classifiers, rearranging the reference sequences is recommended to avoid improper alignments of joined reads. For word-counting classifiers, joined reads could be compared to the original reference for classification. We also applied read joining to our own real MiSeq PE data of nasal microbiota of asthmatic children. Before joining, trimming low quality bases was necessary for optimizing taxonomy annotation and sequence clustering. We then showed that read joining increased the amount of effective data for taxonomy annotation. Using these joined trimmed reads, we were able to identify two promising bacterial genera that might be associated with asthma exacerbation. Conclusions When mergeable PE reads are limited, joining them into single reads for taxonomy annotation is always recommended. Reference sequences may need to be rearranged accordingly depending on the classifier. Read joining also relaxes the constraint on primer selection, and thus may unleash the full capacity of Illumina PE data for taxonomy annotation. Our work provides guidance for fully utilizing PE data of a marker gene when mergeable reads are limited.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Joining Illumina paired-end reads for classifying phylogenetic marker sequences
    Tsunglin Liu
    Chen-Yu Chen
    An Chen-Deng
    Yi-Lin Chen
    Jiu-Yao Wang
    Yung-I Hou
    Min-Ching Lin
    [J]. BMC Bioinformatics, 21
  • [2] PANDAseq: paired-end assembler for illumina sequences
    Andre P Masella
    Andrea K Bartram
    Jakub M Truszkowski
    Daniel G Brown
    Josh D Neufeld
    [J]. BMC Bioinformatics, 13
  • [3] PANDAseq: PAired-eND Assembler for Illumina sequences
    Masella, Andre P.
    Bartram, Andrea K.
    Truszkowski, Jakub M.
    Brown, Daniel G.
    Neufeld, Josh D.
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [4] NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
    Ksenia Khelik
    Geir Kjetil Sandve
    Alexander Johan Nederbragt
    Torbjørn Rognes
    [J]. BMC Bioinformatics, 21
  • [5] NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads
    Khelik, Ksenia
    Sandve, Geir Kjetil
    Nederbragt, Alexander Johan
    Rognes, Torbjorn
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [6] A Filtering Method to Generate High Quality Short Reads Using Illumina Paired-End Technology
    Eren, A. Murat
    Vineis, Joseph H.
    Morrison, Hilary G.
    Sogin, Mitchell L.
    [J]. PLOS ONE, 2013, 8 (06):
  • [7] Isolation and characterization of Ligustrum micranthum (Oleaceae) microsatellite loci using paired-end Illumina reads
    Sugai, Kyoko
    Setsuko, Suzuki
    [J]. PLANT SPECIES BIOLOGY, 2017, 32 (01) : 87 - 91
  • [8] Paired-end sequencing of Fosmid libraries by Illumina
    Williams, Louise J. S.
    Tabbaa, Diana G.
    Li, Na
    Berlin, Aaron M.
    Shea, Terrance P.
    MacCallum, Iain
    Lawrence, Michael S.
    Drier, Yotam
    Getz, Gad
    Young, Sarah K.
    Jaffe, David B.
    Nusbaum, Chad
    Gnirke, Andreas
    [J]. GENOME RESEARCH, 2012, 22 (11) : 2241 - 2249
  • [9] Impact of quality trimming on the efficiency of reads joining and diversity analysis of Illumina paired-end reads in the context of QIIME1 and QIIME2 microbiome analysis frameworks
    Mohsen, Attayeb
    Park, Jonguk
    Chen, Yi-An
    Kawashima, Hitoshi
    Mizuguchi, Kenji
    [J]. BMC BIOINFORMATICS, 2019, 20 (01)
  • [10] Impact of quality trimming on the efficiency of reads joining and diversity analysis of Illumina paired-end reads in the context of QIIME1 and QIIME2 microbiome analysis frameworks
    Attayeb Mohsen
    Jonguk Park
    Yi-An Chen
    Hitoshi Kawashima
    Kenji Mizuguchi
    [J]. BMC Bioinformatics, 20