Searching for supermaximal repeats in large DNA sequences

被引:0
|
作者
Lian, Chen Na [1 ]
Halachev, Mihail [1 ]
Shiri, Nematollaah [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Montreal, PQ, Canada
关键词
DNA sequences; supermaximal repeats; suffix tree; performance;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We study the problem of finding supermaximal repeats in large DNA sequences. For this, we propose an algorithm called SMR which uses an auxiliary index structure (POL), which is derived from and replaces the suffix tree index ST-FD64 [1]. The results of our numerous experiments using the 24 human chromosomes data indicate that SMR outperforms the solution provided as part of the Vmatch [2] software tool. In searching for supermaximal repeats of size at least 10 bases, SMR is twice faster than Vmatch; for a minimum length of 25 bases, SMR is 7 times faster; and for repeats of length at least 200, SMR is about 9 times faster. We also study the cost of POL in terms of time and space requirements.
引用
收藏
页码:87 / 101
页数:15
相关论文
共 50 条
  • [1] Space-Efficient Computation of Maximal and Supermaximal Repeats in Genome Sequences
    Beller, Timo
    Berger, Katharina
    Ohlebusch, Enno
    STRING PROCESSING AND INFORMATION RETRIEVAL: 19TH INTERNATIONAL SYMPOSIUM, SPIRE 2012, 2012, 7608 : 99 - 110
  • [2] SMART: SuperMaximal approximate repeats tool
    Ayad, Lorraine A. K.
    Charalampopoulos, Panagiotis
    Pissis, Solon P.
    BIOINFORMATICS, 2020, 36 (08) : 2589 - 2591
  • [3] Searching Exact Tandem Repeats in DNA Sequences Using Enhanced Suffix Array
    Gupta, Shivika
    Prasad, Rajesh
    CURRENT BIOINFORMATICS, 2018, 13 (02) : 216 - 222
  • [4] SMART: SuperMaximal approximate repeats tool
    Ayad, Lorraine A. K.
    Charalampopoulos, Panagiotis
    Pissis, Solon P.
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,
  • [5] Searching for smallest grammars on large sequences and application to DNA
    Carrascosa, Rafael
    Coste, Francois
    Galle, Matthias
    Infante-Lopez, Gabriel
    JOURNAL OF DISCRETE ALGORITHMS, 2012, 11 : 62 - 72
  • [6] Repseek, a tool to retrieve approximate repeats from large DNA sequences
    Achaz, Guillaume
    Boyer, Frederic
    Rocha, Eduardo P. C.
    Viari, Alain
    Coissac, Eric
    BIOINFORMATICS, 2007, 23 (01) : 119 - 121
  • [7] An efficient tool for searching maximal and super maximal repeats in large DNA/protein sequences via induced-enhanced suffix array
    Kumar S.
    Agarwal S.
    Ranvijay
    Recent Patents on Computer Science, 2019, 12 (02) : 128 - 134
  • [8] BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows-Wheeler transform
    Pokrzywa, Rafal
    Polanski, Andrzej
    GENOMICS, 2010, 96 (05) : 316 - 321
  • [9] Repeats and correlations in human DNA sequences
    Holste, D
    Grosse, I
    Beirer, S
    Schieg, P
    Herzel, H
    PHYSICAL REVIEW E, 2003, 67 (06):
  • [10] Trigonometric transforms for finding repeats in DNA sequences
    Rushdi, Ahmad
    Tuqan, Jamal
    2008 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS, 2008, : 88 - 91