Finding the biologically optimal alignment of multiple sequences

被引:6
|
作者
Mamitsuka, H [1 ]
机构
[1] Kyoto Univ, Inst Chem Res, Uji 6110011, Japan
关键词
multiple sequence alignment; multiple columns model; similarity scores; deterministic annealing; maximum entropy;
D O I
10.1016/j.artmed.2005.01.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Objective: Deterministic annealing, which is derived from statistical physics, is a method for obtaining the global optimum in parameter space. During the annealing process, starting from high temperatures which are then lowered, deterministic annealing deterministically find the (global) optimum at each temperature. Thus, deterministic annealing is expected to be more computationally efficient than stochastic sampling strategies to obtain the global optimum. We propose to apply the deterministic annealing technique to the problem of efficiently finding the biologically optimal alignment of multiple sequences. Methods and material: We take a strategy based on probabilistic models for aligning multiple sequences. That is, we train a probabilistic model using given training sequences and obtain their alignment by parsing, i.e. searching for the most likely parse of each sequence and gaps using the trained parameters of the model. In this scenario, we propose a new stochastic model, which is simple enough to be suited to multiple sequence alignment and, unlike existing stochastic models, say a profile hidden Markov model (HMM), allows us to use similarity scores between symbols (or a symbol and a gap). We further present a learning algorithm for our simple model. by combining deterministic annealing with an expectation-maximization (EM) algorithm. We emphasize that our approach is time-efficient, even if the training is done through an annealing process. Results: In our experiments, we used actual protein sequences whose three-dimensional (3D) structures are determined and which are all aligned based on their 3D structures. We compared the results obtained by our approach with those by other existing approaches. Experimental results clearly showed that our approach gave the best performance, in terms of the similarity to the structurally determined alignment, among the approaches tested. Experimental results further indicated that our approach was ten times more efficient in terms of actual computation time than a competing method. (c) 2005 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:9 / 18
页数:10
相关论文
共 50 条
  • [1] OPTIMAL ALIGNMENT BETWEEN GROUPS OF SEQUENCES AND ITS APPLICATION TO MULTIPLE SEQUENCE ALIGNMENT
    GOTOH, O
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1993, 9 (03): : 361 - 370
  • [2] Optimal Distance Matrix for Multiple Alignment of Amino acid sequences
    Ohshiro, Ayako
    Okazaki, Takeo
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (10): : 24 - 28
  • [3] QOMA: quasi-optimal multiple alignment of protein sequences
    Zhang, Xu
    Kahveci, Tamer
    BIOINFORMATICS, 2007, 23 (02) : 162 - 168
  • [4] ON MULTIPLE ALIGNMENT OF GENOME SEQUENCES
    OHYA, M
    MIYAZAKI, S
    OGATA, K
    IEICE TRANSACTIONS ON COMMUNICATIONS, 1992, E75B (06) : 453 - 457
  • [5] Finding Optimal Alignment and Consensus of Circular Strings
    Lee, Taehyung
    Na, Joong Chae
    Park, Heejin
    Park, Kunsoo
    Sim, Jeong Seop
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2010, 6129 : 310 - +
  • [6] Finding consensus and optimal alignment of circular strings
    Lee, Taehyung
    Na, Joong Chae
    Park, Heejin
    Park, Kunsoo
    Sim, Jeong Seop
    THEORETICAL COMPUTER SCIENCE, 2013, 468 : 92 - 101
  • [7] MULTIPLE ALIGNMENT OF SEQUENCES ON PARALLEL COMPUTERS
    DATE, S
    KULKARNI, R
    KULKARNI, B
    KULKARNIKALE, U
    KOLASKAR, AS
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1993, 9 (04): : 397 - 402
  • [8] A MULTIPLE ALIGNMENT PROGRAM FOR PROTEIN SEQUENCES
    SANTIBANEZ, M
    ROHDE, K
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1987, 3 (02): : 111 - 114
  • [9] Pareto-optimal alignment of biological sequences
    Roytberg, MA
    Semionenkov, MN
    Tabolina, OY
    BIOFIZIKA, 1999, 44 (04): : 581 - 594
  • [10] Multiple sequence alignment based on profile alignment of intermediate sequences
    Lu, Yue
    Sze, Sing-Hoi
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2007, 4453 : 283 - +