An MCMC algorithm for detecting short adjacent repeats shared by multiple sequences

被引:2
|
作者
Li, Qiwei [1 ]
Fan, Xiaodan [1 ]
Liang, Tong [2 ]
Li, Shuo-Yen R. [2 ]
机构
[1] Chinese Univ Hong Kong, Dept Stat, Sha Tin, Hong Kong, Peoples R China
[2] Chinese Univ Hong Kong, Dept Informat Engn, Sha Tin, Hong Kong, Peoples R China
关键词
TANDEM REPEATS; DNA-SEQUENCES; EXON-III; IDENTIFICATION; ALIGNMENT; ORIGINS; DISEASE; FINDER; GENES;
D O I
10.1093/bioinformatics/btr287
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Repeats detection problems are traditionally formulated as string matching or signal processing problems. They cannot readily handle gaps between repeat units and are incapable of detecting repeat patterns shared by multiple sequences. This study detects short adjacent repeats with interunit insertions from multiple sequences. For biological sequences, such studies can shed light on molecular structure, biological function and evolution. Results: The task of detecting short adjacent repeats is formulated as a statistical inference problem by using a probabilistic generative model. An Markov chain Monte Carlo algorithm is proposed to infer the parameters in a de novo fashion. Its applications on synthetic and real biological data show that the new method not only has a competitive edge over existing methods, but also can provide a way to study the structure and the evolution of repeat-containing genes.
引用
收藏
页码:1772 / 1779
页数:8
相关论文
共 50 条
  • [21] Algorithm of detecting structural variations in DNA sequences
    Nalecz-Charkielwicz, Katarzyna
    Nowak, Robert
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
  • [22] Detecting homogenous predictors in high-dimensional panel model with an MCMC algorithm
    Luo, Ronghua
    Lan, Wei
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (09) : 7376 - 7392
  • [23] Autoregressive models for spectral analysis of short tandem repeats in DNA sequences
    Hongxia Zhou
    Hong Yan
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 1286 - +
  • [24] Detecting localized repeats in genomic sequences:: A new strategy and its application to Bacillus subtilis and Arabidopsis thaliana sequences
    Klaerr-Blanchard, M
    Chiapello, H
    Coward, E
    COMPUTERS & CHEMISTRY, 2000, 24 (01): : 57 - 70
  • [25] Distribution of interstitial telomere-like repeats and their adjacent sequences in a dioecious plant, Silene latifolia
    Wakana Uchida
    Sachihiro Matsunaga
    Ryuji Sugiyama
    Fukashi Shibata
    Yusuke Kazama
    Yutaka Miyazawa
    Masahiro Hizume
    Shigeyuki Kawano
    Chromosoma, 2002, 111 : 313 - 320
  • [26] Distribution of interstitial telomere-like repeats and their adjacent sequences in a dioecious plant, Silene latifolia
    Uchida, W
    Matsunaga, S
    Sugiyama, R
    Shibata, F
    Kazama, Y
    Miyazawa, Y
    Hizume, M
    Kawano, S
    CHROMOSOMA, 2002, 111 (05) : 313 - 320
  • [27] Mining shared emerging sequences from multiple datasets
    Chen, Xiangtao
    Wang, Jing
    Ding, Pingjian
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2015, 46 (11): : 4091 - 4099
  • [28] Shared structure facilitates working memory of multiple sequences
    Huang, Qiaoli
    Luo, Huan
    ELIFE, 2024, 12
  • [29] The multiple alignments of very short sequences
    Takacs, Kristof
    Grolmusz, Vince
    FASEB BIOADVANCES, 2021, 3 (07) : 523 - 530
  • [30] A Fast Shot Transition Detecting Algorithm on MPEG Sequences
    Zheng Peng
    Department of Compuer Science
    Wuhan University Journal of Natural Sciences, 2003, (02) : 358 - 362