Locating tandem repeats in weighted sequences in proteins

被引:0
|
作者
Hui Zhang
Qing Guo
Costas S Iliopoulos
机构
[1] Zhejiang University of Technology,College of Computer Science and Technology
[2] Zhejiang University,Corresponding author. College of Computer Science and Engineering
[3] King's College London Strand,Department of Computer Science
来源
关键词
Equivalence Class; Tandem Repeat; Independent Component Analysis; Weighted Sequence; Nonnegative Matrix Factorization;
D O I
暂无
中图分类号
学科分类号
摘要
A weighted biological sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. We attempt to locate all the tandem repeats in a weighted sequence. A repeated substring is called a tandem repeat if each occurrence of the substring is directly adjacent to each other. By introducing the idea of equivalence classes in weighted sequences, we identify the tandem repeats of every possible length using an iterative partitioning technique. We also present the algorithm for recording the tandem repeats, and prove that the problem can be solved in O(n2) time.
引用
收藏
相关论文
共 50 条
  • [41] Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins
    Simon, Michelle
    Hancock, John M.
    GENOME BIOLOGY, 2009, 10 (06):
  • [42] RepeatsDB in 2025: expanding annotations of structured tandem repeats proteins on AlphaFoldDB
    Clementel, Damiano
    Arrias, Paula Nazarena
    Mozaffari, Soroush
    Osmanli, Zarifa
    Castro, Ximena Aixa
    Ferrari, Carlo
    Kajava, Andrey, V
    Tosatto, Silvio C. E.
    Monzon, Alexander Miguel
    NUCLEIC ACIDS RESEARCH, 2024, 53 (D1) : D575 - D581
  • [43] Chemically Modified Tandem Repeats in Proteins: Natural Combinatorial Peptide Libraries
    Fuchs, Stephen M.
    ACS CHEMICAL BIOLOGY, 2013, 8 (02) : 275 - 282
  • [44] Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins
    Michelle Simon
    John M Hancock
    Genome Biology, 10
  • [45] ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins
    Luo, Hong
    Lin, Ke
    David, Audrey
    Nijveen, Harm
    Leunissen, Jack A. M.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D394 - D399
  • [46] Tandem clusters of membrane proteins in complete genome sequences
    Kihara, D
    Kanehisa, M
    GENOME RESEARCH, 2000, 10 (06) : 731 - 743
  • [47] Offline: Tandem repeats
    Horton, Richard
    LANCET, 2010, 376 (9756): : 1886 - 1886
  • [48] THE PENICILLIN GENE-CLUSTER IS AMPLIFIED IN TANDEM REPEATS LINKED BY CONSERVED HEXANUCLEOTIDE SEQUENCES
    FIERRO, F
    BARREDO, JL
    DIEZ, B
    GUTIERREZ, S
    FERNANDEZ, FJ
    MARTIN, JF
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1995, 92 (13) : 6200 - 6204
  • [49] Detection of significant patterns by compression algorithms: The case of approximate tandem repeats in DNA sequences
    Rivals, E
    Delgrange, O
    Delahaye, JP
    Dauchet, M
    Delorme, MO
    Henaut, A
    Ollivier, E
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1997, 13 (02): : 131 - 136
  • [50] An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats in Biological Sequences
    Matroud, Atheer A.
    Tuffley, Christopher P.
    Hendy, Michael D.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2011, 18 (09) : 1211 - 1218