An MCMC algorithm for detecting short adjacent repeats shared by multiple sequences

被引:2
|
作者
Li, Qiwei [1 ]
Fan, Xiaodan [1 ]
Liang, Tong [2 ]
Li, Shuo-Yen R. [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Stat, Sha Tin, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Informat Engn, Sha Tin, Hong Kong, Peoples R China
关键词
TANDEM REPEATS; DNA-SEQUENCES; EXON-III; IDENTIFICATION; ALIGNMENT; ORIGINS; DISEASE; FINDER; GENES;
D O I
10.1093/bioinformatics/btr287
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Repeats detection problems are traditionally formulated as string matching or signal processing problems. They cannot readily handle gaps between repeat units and are incapable of detecting repeat patterns shared by multiple sequences. This study detects short adjacent repeats with interunit insertions from multiple sequences. For biological sequences, such studies can shed light on molecular structure, biological function and evolution. Results: The task of detecting short adjacent repeats is formulated as a statistical inference problem by using a probabilistic generative model. An Markov chain Monte Carlo algorithm is proposed to infer the parameters in a de novo fashion. Its applications on synthetic and real biological data show that the new method not only has a competitive edge over existing methods, but also can provide a way to study the structure and the evolution of repeat-containing genes.
引用
收藏
页码:1772 / 1779
页数:8
相关论文
共 50 条
  • [1] An Evolutionary Monte Carlo Algorithm for Identifying Short Adjacent Repeats in Multiple Sequences
    Xu, Jin
    Li, Qiwei
    Fan, Xiaodan
    Li, Victor O. K.
    Li, Shuo-Yen Robert
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2010, : 643 - 648
  • [2] Finding identical sequence repeats in multiple protein sequences: An algorithm
    Vikas Kumar Maurya
    Madhumathi Sanjeevi
    Chandrasekar Narayanan Rahul
    Ajitha Mohan
    Dhanalakshmi Ramachandran
    Rashmi Siddalingappa
    Roshan Rauniyar
    Sekar Kanagaraj
    Journal of Biosciences, 49
  • [3] Finding identical sequence repeats in multiple protein sequences: An algorithm
    Maurya, Vikas Kumar
    Sanjeevi, Madhumathi
    Rahul, Chandrasekar Narayanan
    Mohan, Ajitha
    Ramachandran, Dhanalakshmi
    Siddalingappa, Rashmi
    Rauniyar, Roshan
    Kanagaraj, Sekar
    JOURNAL OF BIOSCIENCES, 2024, 49 (01)
  • [4] A novel algorithm for detecting multiple covariance and clustering of biological sequences
    Shen, Wei
    Li, Yan
    SCIENTIFIC REPORTS, 2016, 6
  • [5] A novel algorithm for detecting multiple covariance and clustering of biological sequences
    Wei Shen
    Yan Li
    Scientific Reports, 6
  • [6] Spectral Method for Detecting Inexact Repeats in Character Sequences
    Pankratov, A. N.
    Pankratova, N. M.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2022, 32 (03) : 622 - 625
  • [7] Spectral Method for Detecting Inexact Repeats in Character Sequences
    A. N. Pankratov
    N. M. Pankratova
    Pattern Recognition and Image Analysis, 2022, 32 : 622 - 625
  • [8] CLOSELY ADJACENT SATELLITE, SUBTELOMERIC AND TELOMERIC REPEATS SHARED IN MOUSE AND RAT
    BROCCOLI, D
    MILLER, OJ
    MILLER, DA
    AMERICAN JOURNAL OF HUMAN GENETICS, 1991, 49 (04) : 296 - 296
  • [9] Multiple alignment of protein sequences with repeats and rearrangements
    Phuong, Tu Minh
    Do, Chuong B.
    Edgar, Robert C.
    Batzoglou, Serafim
    NUCLEIC ACIDS RESEARCH, 2006, 34 (20) : 5932 - 5942
  • [10] NUCLEOTIDE-SEQUENCES OF THE RETROVIRAL LONG TERMINAL REPEATS AND THEIR ADJACENT REGIONS
    CHEN, HR
    BARKER, WC
    NUCLEIC ACIDS RESEARCH, 1984, 12 (04) : 1767 - 1778