Finding the biologically optimal alignment of multiple sequences

被引：6

作者：

Mamitsuka, H ^{[1
]}

机构：

[1] Kyoto Univ, Inst Chem Res, Uji 6110011, Japan

来源：

ARTIFICIAL INTELLIGENCE IN MEDICINE | 2005年 / 35卷 / 1-2期

关键词：

multiple sequence alignment; multiple columns model; similarity scores; deterministic annealing; maximum entropy;

D O I：

10.1016/j.artmed.2005.01.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Objective: Deterministic annealing, which is derived from statistical physics, is a method for obtaining the global optimum in parameter space. During the annealing process, starting from high temperatures which are then lowered, deterministic annealing deterministically find the (global) optimum at each temperature. Thus, deterministic annealing is expected to be more computationally efficient than stochastic sampling strategies to obtain the global optimum. We propose to apply the deterministic annealing technique to the problem of efficiently finding the biologically optimal alignment of multiple sequences. Methods and material: We take a strategy based on probabilistic models for aligning multiple sequences. That is, we train a probabilistic model using given training sequences and obtain their alignment by parsing, i.e. searching for the most likely parse of each sequence and gaps using the trained parameters of the model. In this scenario, we propose a new stochastic model, which is simple enough to be suited to multiple sequence alignment and, unlike existing stochastic models, say a profile hidden Markov model (HMM), allows us to use similarity scores between symbols (or a symbol and a gap). We further present a learning algorithm for our simple model. by combining deterministic annealing with an expectation-maximization (EM) algorithm. We emphasize that our approach is time-efficient, even if the training is done through an annealing process. Results: In our experiments, we used actual protein sequences whose three-dimensional (3D) structures are determined and which are all aligned based on their 3D structures. We compared the results obtained by our approach with those by other existing approaches. Experimental results clearly showed that our approach gave the best performance, in terms of the similarity to the structurally determined alignment, among the approaches tested. Experimental results further indicated that our approach was ten times more efficient in terms of actual computation time than a competing method. (c) 2005 Elsevier B.V. All rights reserved.

引用

页码：9 / 18

页数：10

共 50 条

[41] MAGNOLIA: multiple alignment of proteincoding and structural RNA sequences
Fontaine, Arnaud
de Monte, Antoine
Touzet, Helene
NUCLEIC ACIDS RESEARCH, 2008, 36 : W14 - W18
[42] Multiple alignment for amino acid sequences by dynamic programming
Ohya, M
Ogata, K
ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1998, 81 (04): : 12 - 20
[43] Efficient Parallel Algorithm for Optimal Three-Sequences Alignment
Lin, Chun Yuan
Huang, Chen Tai
Chung, Yeh-Ching
Tang, Chuan Yi
2007 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPP), 2007, : 113 - 120
[44] An efficient model for extracting an optimal alignment with multiple cardinalities in ontology alignment
Touati C.
Benaissa M.
Lebbah Y.
International Journal of Metadata, Semantics and Ontologies, 2016, 11 (02): : 71 - 81
[45] Optimal Symbol Alignment Distance: A New Distance for Sequences of Symbols
Herranz, Javier
Nin, Jordi
Sole, Marc
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (10) : 1541 - 1554
[46] ChemAlign: Biologically Relevant Multiple Sequence Alignment Using Physicochemical Properties
Carroll, Hyrum
Clement, Mark
Snell, Quinn
McClellan, David
2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 70 - +
[47] On the hardness of finding optimal multiple preset dictionaries
Mitzenmacher, M
DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, : 411 - 418
[48] Finding Counterfactually Optimal Action Sequences in Continuous State Spaces
Tsirtsis, Stratis
Gomez-Rodriguez, Manuel
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] On the hardness of finding optimal multiple preset dictionaries
Mitzenmacher, M
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (07) : 1536 - 1539
[50] Three-Way Alignment Improves Multiple Sequence Alignment of Highly Diverged Sequences
Rad, Mahbubeh Askari
Kruglikov, Alibek
Xia, Xuhua
ALGORITHMS, 2024, 17 (05)

← 1 2 3 4 5 →