XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences

被引:118
|
作者
Newman, Aaron M. [1 ]
Cooper, James B. [1 ,2 ]
机构
[1] Univ Calif Santa Barbara, Biomol Sci & Engn Program, Santa Barbara, CA 93106 USA
[2] Univ Calif Santa Barbara, Dept Mol Cellular & Dev Biol, Santa Barbara, CA 93106 USA
关键词
D O I
10.1186/1471-2105-8-382
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Biological sequence repeats arranged in tandem patterns are widespread in DNA and proteins. While many software tools have been designed to detect DNA tandem repeats (TRs), useful algorithms for identifying protein TRs with varied levels of degeneracy are still needed. Results: To address limitations of current repeat identification methods, and to provide an efficient and flexible algorithm for the detection and analysis of TRs in protein sequences, we designed and implemented a new computational method called XSTREAM. Running time tests confirm the practicality of XSTREAM for analyses of multi-genome datasets. Each of the key capabilities of XSTREAM (e.g., merging, nesting, long-period detection, and TR architecture modeling) are demonstrated using anecdotal examples, and the utility of XSTREAM for identifying TR proteins was validated using data from a recently published paper. Conclusion: We show that XSTREAM is a practical and valuable tool for TR detection in protein and nucleotide sequences at the multi-genome scale, and an effective tool for modeling TR domains with diverse architectures and varied levels of degeneracy. Because of these useful features, XSTREAM has significant potential for the discovery of naturally-evolved modular proteins with applications for engineering novel biostructural and biomimetic materials, and identifying new vaccine and diagnostic targets.
引用
收藏
页数:19
相关论文
共 28 条
  • [21] DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures
    Pages, Guillaume
    Grudinin, Sergei
    BIOINFORMATICS, 2019, 35 (24) : 5113 - 5120
  • [22] SQID: An Intensity-Incorporated Protein Identification Algorithm for Tandem Mass Spectrometry
    Li, Wenzhou
    Ji, Li
    Goya, Jonathan
    Tan, Guanhong
    Wysocki, Vicki H.
    JOURNAL OF PROTEOME RESEARCH, 2011, 10 (04) : 1593 - 1602
  • [23] Assembly and release of human immunodeficiency virus type 1 Gag proteins containing tandem repeats of the matrix protein coding sequences in the matrix domain
    Wang, CT
    Chen, SSL
    Chiang, CC
    VIROLOGY, 2000, 278 (01) : 289 - 298
  • [24] A protein identification algorithm for tandem mass spectrometry by incorporating the abundance of mRNA into a binomial probability scoring model
    Ma, Wen-Tai
    Liu, Zhao-Yu
    Chen, Xiao-Zhou
    Lin, Zhen-Liang
    Zheng, Zhong-Bing
    Miao, Wei-Guo
    Xie, Shang-Qian
    JOURNAL OF PROTEOMICS, 2019, 197 : 53 - 59
  • [25] Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform
    Liu, Guangchen
    Luan, Yihui
    ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [26] PITDI: A Novel Protein Identification Algorithm for Tandem Mass Spectrometry Based on Target-Decoy Matching Information
    Lu, Xiangyu
    Zhu, Simin
    INTERNATIONAL CONFERENCE ON FRONTIERS OF BIOLOGICAL SCIENCES AND ENGINEERING (FBSE 2018), 2019, 2058
  • [27] Proteomic analysis of protein nitration in aging skeletal muscle and identification of nitrotyrosine-containing sequences in vivo by nanoelectrospray ionization tandem mass spectrometry
    Kanski, J
    Hong, SJ
    Schöneich, C
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2005, 280 (25) : 24261 - 24266
  • [28] Binomial Probability Distribution Model-Based Protein Identification Algorithm for Tandem Mass Spectrometry Utilizing Peak Intensity Information
    Xiao, Chuan-Le
    Chen, Xiao-Zhou
    Du, Yang-Li
    Sun, Xuesong
    Zhang, Gong
    He, Qing-Yu
    JOURNAL OF PROTEOME RESEARCH, 2013, 12 (01) : 328 - 335