Aligning biological sequences by exploiting residue conservation and coevolution

被引:8
|
作者
Muntoni, Anna Paola [1 ,2 ,3 ]
Pagnani, Andrea [1 ,4 ,5 ]
Weigt, Martin [3 ]
Zamponi, Francesco [2 ]
机构
[1] Politecn Torino, Dept Appl Sci & Technol DISAT, Corso Duca Abruzzi 24, I-10129 Turin, Italy
[2] Univ Paris, Sorbonne Univ, Univ PSL, Lab Phys,ENS,CNRS, F-75005 Paris, France
[3] Sorbonne Univ, CNRS, Inst Biol Paris Seine, Biol Computat & Quantitat LCQB, F-75005 Paris, France
[4] IRCCS Candiolo, Italian Inst Genom Med, SP 142, I-10060 Candiolo, TO, Italy
[5] Ist Nazl Fis Nucl, Sez Torino, Via Giuria 1, I-10125 Turin, Italy
基金
欧盟地平线“2020”;
关键词
60;
D O I
10.1103/PhysRevE.102.062409
中图分类号
O35 [流体力学]; O53 [等离子体物理学];
学科分类号
070204 ; 080103 ; 080704 ;
摘要
Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Elements of Coevolution in Biological Sequences
    Rivoire, Olivier
    PHYSICAL REVIEW LETTERS, 2013, 110 (17)
  • [2] A FAST HOMOLOGY PROGRAM FOR ALIGNING BIOLOGICAL SEQUENCES
    TAYLOR, P
    NUCLEIC ACIDS RESEARCH, 1984, 12 (01) : 447 - 455
  • [3] Parallel PoMSA for Aligning Multiple Biological Sequences on Multicore Computers
    Shehab, Sara
    Abdulah, Sameh
    Keshk, Arabi
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 69 - 74
  • [4] The difficulty of aligning intrinsically disordered protein sequences as assessed by conservation and phylogeny
    Riley, Andrew C.
    Ashlock, Daniel A.
    Graether, Steffen P.
    PLOS ONE, 2023, 18 (07):
  • [6] Aligning biological sequences on distributed bus networks: A divisible load scheduling approach
    Min, WH
    Veeravalli, B
    IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2005, 9 (04): : 489 - 501
  • [7] Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction
    Teppa, Elin
    Wilkins, Angela D.
    Nielsen, Morten
    Marino Buslje, Cristina
    BMC BIOINFORMATICS, 2012, 13
  • [8] Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction
    Elin Teppa
    Angela D Wilkins
    Morten Nielsen
    Cristina Marino Buslje
    BMC Bioinformatics, 13
  • [9] A data parallel strategy for aligning multiple biological sequences on multi-core computers
    Zhu, Xiangyuan
    Li, Kenli
    Salah, Ahmad
    COMPUTERS IN BIOLOGY AND MEDICINE, 2013, 43 (04) : 350 - 361
  • [10] Aligning two fragmented sequences
    Veeramachaneni, V
    Berman, P
    Miller, W
    DISCRETE APPLIED MATHEMATICS, 2003, 127 (01) : 119 - 143