XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences

被引:118
|
作者
Newman, Aaron M. [1 ]
Cooper, James B. [1 ,2 ]
机构
[1] Univ Calif Santa Barbara, Biomol Sci & Engn Program, Santa Barbara, CA 93106 USA
[2] Univ Calif Santa Barbara, Dept Mol Cellular & Dev Biol, Santa Barbara, CA 93106 USA
关键词
D O I
10.1186/1471-2105-8-382
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed. Results: To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper. Conclusion: We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.
引用
收藏
页数:19
相关论文
共 28 条
  • [1] XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences
    Aaron M Newman
    James B Cooper
    BMC Bioinformatics, 8
  • [2] T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm
    Jorda, Julien
    Kajava, Andrey V.
    BIOINFORMATICS, 2009, 25 (20) : 2632 - 2638
  • [3] MGWT based Algorithm for Tandem Repeats Detection in DNA Sequences
    Garg, Pardeep
    Sharma, SunilDatt
    PROCEEDINGS OF 2019 5TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMPUTING AND CONTROL (ISPCC 2K19), 2019, : 196 - 199
  • [4] An algorithm to find distant repeats in a pair of protein sequences
    Sabarinathan, R.
    Banerjee, Nirjhar
    Balakrishnan, N.
    Sekar, K.
    PATTERN RECOGNITION LETTERS, 2010, 31 (14) : 2161 - 2169
  • [5] TAPO: A combined method for the identification of tandem repeats in protein structures
    Do Viet, Phuong
    Roche, Daniel B.
    Kajava, Andrey V.
    FEBS LETTERS, 2015, 589 (19) : 2611 - 2619
  • [6] Finding identical sequence repeats in multiple protein sequences: An algorithm
    Vikas Kumar Maurya
    Madhumathi Sanjeevi
    Chandrasekar Narayanan Rahul
    Ajitha Mohan
    Dhanalakshmi Ramachandran
    Rashmi Siddalingappa
    Roshan Rauniyar
    Sekar Kanagaraj
    Journal of Biosciences, 49
  • [7] Finding identical sequence repeats in multiple protein sequences: An algorithm
    Maurya, Vikas Kumar
    Sanjeevi, Madhumathi
    Rahul, Chandrasekar Narayanan
    Mohan, Ajitha
    Ramachandran, Dhanalakshmi
    Siddalingappa, Rashmi
    Rauniyar, Roshan
    Kanagaraj, Sekar
    JOURNAL OF BIOSCIENCES, 2024, 49 (01)
  • [8] Ab initio detection of fuzzy amino acid tandem repeats in protein sequences
    Pellegrini, Marco
    Renda, Maria Elena
    Vecchio, Alessio
    BMC BIOINFORMATICS, 2012, 13
  • [9] Ab initio detection of fuzzy amino acid tandem repeats in protein sequences
    Marco Pellegrini
    Maria Elena Renda
    Alessio Vecchio
    BMC Bioinformatics, 13
  • [10] An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats in Biological Sequences
    Matroud, Atheer A.
    Tuffley, Christopher P.
    Hendy, Michael D.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (09) : 1211 - 1218