Aligning biological sequences by exploiting residue conservation and coevolution

被引:8
|
作者
Muntoni, Anna Paola [1 ,2 ,3 ]
Pagnani, Andrea [1 ,4 ,5 ]
Weigt, Martin [3 ]
Zamponi, Francesco [2 ]
机构
[1] Politecn Torino, Dept Appl Sci & Technol DISAT, Corso Duca Abruzzi 24, I-10129 Turin, Italy
[2] Univ Paris, Sorbonne Univ, Univ PSL, Lab Phys,ENS,CNRS, F-75005 Paris, France
[3] Sorbonne Univ, CNRS, Inst Biol Paris Seine, Biol Computat & Quantitat LCQB, F-75005 Paris, France
[4] IRCCS Candiolo, Italian Inst Genom Med, SP 142, I-10060 Candiolo, TO, Italy
[5] Ist Nazl Fis Nucl, Sez Torino, Via Giuria 1, I-10125 Turin, Italy
基金
欧盟地平线“2020”;
关键词
60;
D O I
10.1103/PhysRevE.102.062409
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Scoring residue conservation
    Valdar, WSJ
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2002, 48 (02) : 227 - 241
  • [32] CAPS: coevolution analysis using protein sequences
    Fares, Mario A.
    McNally, David
    BIOINFORMATICS, 2006, 22 (22) : 2821 - 2822
  • [33] Identification of Residue-Residue Contacts Using a Novel Coevolution-Based Method
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    CURRENT PROTEOMICS, 2016, 13 (02) : 122 - 129
  • [34] Aligning coding sequences with frameshift extension penalties
    Safa Jammali
    Esaie Kuitche
    Ayoub Rachati
    François Bélanger
    Michelle Scott
    Aïda Ouangraoua
    Algorithms for Molecular Biology, 12
  • [35] A tool for aligning very similar DNA sequences
    Chao, KM
    Zhang, JH
    Ostell, J
    Miller, W
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1997, 13 (01): : 75 - 80
  • [36] Aligning Protein Sequences with Predicted Secondary Structure
    Kececioglu, John
    Kim, Eagu
    Wheeler, Travis
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (03) : 561 - 580
  • [37] A simple method for aligning many protein sequences
    Bladon, P
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2001, 41 (02): : 278 - 280
  • [38] Aligning DNA sequences to minimize the change in protein
    Hua, YF
    Jiang, T
    Wu, B
    JOURNAL OF COMBINATORIAL OPTIMIZATION, 1999, 3 (2-3) : 227 - 245
  • [39] Aligning Two Genomic Sequences That Contain Duplications
    Hou, Minmei
    Riemer, Cathy
    Berman, Piotr
    Hardison, Ross C.
    Miller, Webb
    COMPARATIVE GENOMICS, PROCEEDINGS, 2009, 5817 : 98 - +
  • [40] Aligning coding sequences with frameshift extension penalties
    Jammali, Safa
    Kuitche, Esaie
    Rachati, Ayoub
    Belanger, Francois
    Scott, Michelle
    Ouangraoua, Aida
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2017, 12