Aligning biological sequences by exploiting residue conservation and coevolution

被引:8
|
作者
Muntoni, Anna Paola [1 ,2 ,3 ]
Pagnani, Andrea [1 ,4 ,5 ]
Weigt, Martin [3 ]
Zamponi, Francesco [2 ]
机构
[1] Politecn Torino, Dept Appl Sci & Technol DISAT, Corso Duca Abruzzi 24, I-10129 Turin, Italy
[2] Univ Paris, Sorbonne Univ, Univ PSL, Lab Phys,ENS,CNRS, F-75005 Paris, France
[3] Sorbonne Univ, CNRS, Inst Biol Paris Seine, Biol Computat & Quantitat LCQB, F-75005 Paris, France
[4] IRCCS Candiolo, Italian Inst Genom Med, SP 142, I-10060 Candiolo, TO, Italy
[5] Ist Nazl Fis Nucl, Sez Torino, Via Giuria 1, I-10125 Turin, Italy
基金
欧盟地平线“2020”;
关键词
60;
D O I
10.1103/PhysRevE.102.062409
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Aligning DNA Sequences to Minimize the Change in Protein
    Yufang Hua
    Tao Jiang
    Bin Wu
    Journal of Combinatorial Optimization, 1999, 3 : 227 - 245
  • [42] Exploiting a list of protein sequences
    Marden, Michael C.
    Dewilde, Sylvia
    Kiger, Laurent
    Hamdane, Djemel
    Uzan, Julien
    Burmester, Thorsten
    Hankeln, Thomas
    Moens, Luc
    Baudin-Creuza, Veronique
    Celier, Chantal
    Wajeman, Henri
    GENE, 2007, 398 (1-2) : 35 - 41
  • [43] DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors
    Muntoni, Anna Paola
    Pagnani, Andrea
    BIOINFORMATICS, 2023, 39 (09)
  • [44] Integrated Analysis of Residue Coevolution and Protein Structure in ABC Transporters
    Gulyas-Kovacs, Attila
    PLOS ONE, 2012, 7 (05):
  • [45] From residue coevolution to protein conformational ensembles and functional dynamics
    Sutto, Ludovico
    Marsili, Simone
    Valencia, Alfonso
    Gervasio, Francesco Luigi
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (44) : 13567 - 13572
  • [46] The distribution of sequences in residue classes
    Elsholtz, C
    PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 2002, 130 (08) : 2247 - 2250
  • [47] COVTree: Coevolution in OVerlapped sequences by Tree analysis server
    Teppa, Elin
    Zea, Diego J.
    Oteri, Francesco
    Carbone, Alessandra
    NUCLEIC ACIDS RESEARCH, 2020, 48 (W1) : W558 - W565
  • [48] Aligning Discovered Patterns from Protein Family Sequences
    Lee, En-Shiun Annie
    Zhuang, Dennis
    Wong, Andrew K. C.
    PATTERN RECOGNITION IN BIOINFORMATICS, 2012, 7632 : 243 - 254
  • [49] Enhancing Cooperative Coevolution for Large Scale Optimization by Exploiting Decomposition Solutions
    Chen, An
    Ren, Zhigang
    Liang, Yongsheng
    Guo, Daofu
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 1047 - 1053
  • [50] Comparative assessment of methods for aligning multiple genome sequences
    Xiaoyu Chen
    Martin Tompa
    Nature Biotechnology, 2010, 28 : 567 - 572