Locating tandem repeats in weighted sequences in proteins

被引:0
|
作者
Hui Zhang
Qing Guo
Costas S Iliopoulos
机构
[1] Zhejiang University of Technology,College of Computer Science and Technology
[2] Zhejiang University,Corresponding author. College of Computer Science and Engineering
[3] King's College London Strand,Department of Computer Science
来源
关键词
Equivalence Class; Tandem Repeat; Independent Component Analysis; Weighted Sequence; Nonnegative Matrix Factorization;
D O I
暂无
中图分类号
学科分类号
摘要
A weighted biological sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. We attempt to locate all the tandem repeats in a weighted sequence. A repeated substring is called a tandem repeat if each occurrence of the substring is directly adjacent to each other. By introducing the idea of equivalence classes in weighted sequences, we identify the tandem repeats of every possible length using an iterative partitioning technique. We also present the algorithm for recording the tandem repeats, and prove that the problem can be solved in O(n2) time.
引用
收藏
相关论文
共 50 条
  • [31] Identification of All Exact and Approximate Inverted Repeats in Regular and Weighted Sequences
    Barton, Carl
    Iliopoulos, Costas S.
    Mulder, Nicola
    Watson, Bruce
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PT II, 2013, 384 : 11 - 19
  • [32] Chickens possess centromeres with both extended tandem repeats and short non-tandem-repetitive sequences
    Shang, Wei-Hao
    Hori, Tetsuya
    Toyoda, Atsushi
    Kato, Jun
    Popendorf, Kris
    Sakakibara, Yasubumi
    Fujiyama, Asao
    Fukagawa, Tatsuo
    GENOME RESEARCH, 2010, 20 (09) : 1219 - 1228
  • [33] XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences
    Aaron M Newman
    James B Cooper
    BMC Bioinformatics, 8
  • [34] Ab initio detection of fuzzy amino acid tandem repeats in protein sequences
    Pellegrini, Marco
    Renda, Maria Elena
    Vecchio, Alessio
    BMC BIOINFORMATICS, 2012, 13
  • [35] Tandem Repeats and G-Rich Sequences Are Enriched at Human CNV Breakpoints
    Bose, Promita
    Hermetz, Karen E.
    Conneely, Karen N.
    Rudd, M. Katharine
    PLOS ONE, 2014, 9 (07):
  • [36] The penicillin gene cluster is amplified in tandem repeats linked by conserved hexanucleotide sequences
    Fierro, F.
    Barredo, J. L.
    Diez, B.
    Gutierrez, S.
    Physical Review B: Condensed Matter, 51 (23):
  • [37] XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences
    Newman, Aaron M.
    Cooper, James B.
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [38] Distributions of dimeric tandem repeats in non-coding and coding DNA sequences
    Dokholyan, NV
    Buldyrev, SV
    Havlin, S
    Stanley, HE
    JOURNAL OF THEORETICAL BIOLOGY, 2000, 202 (04) : 273 - 282
  • [39] Searching Exact Tandem Repeats in DNA Sequences Using Enhanced Suffix Array
    Gupta, Shivika
    Prasad, Rajesh
    CURRENT BIOINFORMATICS, 2018, 13 (02) : 216 - 222
  • [40] Ab initio detection of fuzzy amino acid tandem repeats in protein sequences
    Marco Pellegrini
    Maria Elena Renda
    Alessio Vecchio
    BMC Bioinformatics, 13