A Rank-Based Sequence Aligner with Applications in Phylogenetic Analysis

被引:7
|
作者
Dinu, Liviu P. [1 ,2 ]
Ionescu, Radu Tudor [1 ]
Tomescu, Alexandru I. [3 ]
机构
[1] Univ Bucharest, Fac Math & Comp Sci, Bucharest, Romania
[2] Personal Genet, Bucharest, Romania
[3] Univ Helsinki, Helsinki Inst Informat Technol HIIT, Dept Comp Sci, Helsinki, Finland
来源
PLOS ONE | 2014年 / 9卷 / 08期
基金
芬兰科学院;
关键词
RNA-SEQ; GENOME REARRANGEMENT; VIBRIO-CHOLERAE; READ ALIGNMENT; TIME ALGORITHM; DNA-SEQUENCES; DISTANCE; EVOLUTION; PATHOGEN; TOOL;
D O I
10.1371/journal.pone.0104006
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing k-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Analysis and design of rank-based classifiers
    Bereta, Michal
    Pedrycz, Witold
    Reformat, Marek
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (08) : 3256 - 3265
  • [2] Rank-based algorithms for analysis of microarrays
    Liu, WM
    Mei, R
    Bartell, DM
    Di, XJ
    [J]. MICROARRAYS: OPTICAL TECHNOLOGIES AND INFORMATICS, 2001, 4266 : 56 - 67
  • [3] Rank-based procedures for analysis of factorial effects
    Lehman, JS
    Wolfe, DA
    Dean, AM
    Hartlaub, BA
    [J]. RECENT ADVANCES IN EXPERIMENTAL DESIGNS AND RELATED TOPICS, 2001, : 35 - 64
  • [4] Rank-based analysis of the heteroscedastic linear model
    Dixon, SL
    McKean, JW
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) : 699 - 712
  • [5] Rank-based regression for analysis of repeated measures
    Wang, You-Gan
    Zhu, Min
    [J]. BIOMETRIKA, 2006, 93 (02) : 459 - 464
  • [6] RANK-BASED PERSISTENCE
    Bergomi, Mattia G.
    Vertechi, Pietro
    [J]. THEORY AND APPLICATIONS OF CATEGORIES, 2020, 35 : 228 - 260
  • [7] Convergence rates for rank-based models with applications to portfolio theory
    Tomoyuki Ichiba
    Soumik Pal
    Mykhaylo Shkolnikov
    [J]. Probability Theory and Related Fields, 2013, 156 : 415 - 448
  • [8] A Rank-based Approach of Cosine Similarity with Applications in Automatic Classification
    Dinu, Liviu P.
    Ionescu, Radu-Tudor
    [J]. 14TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2012), 2012, : 260 - 264
  • [9] RANK-BASED ESTIMATION OF THE RATIO OF SCALE-PARAMETERS AND APPLICATIONS
    PADMANABHAN, AR
    PURI, ML
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1992, 31 (01) : 23 - 49
  • [10] Convergence rates for rank-based models with applications to portfolio theory
    Ichiba, Tomoyuki
    Pal, Soumik
    Shkolnikov, Mykhaylo
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 2013, 156 (1-2) : 415 - 448