Algorithms for matching partially labelled sequence graphs

被引:0
|
作者
Taylor, William R. [1 ]
机构
[1] Francis Crick Inst, 1 Midland Rd, London NW1 1AT, England
基金
英国惠康基金;
关键词
Phylogenetic tree matching; Correlated substitution analysis; Bipartite graph matching; PROTEIN; CONTACTS; COEVOLUTION; PREDICTION;
D O I
10.1186/s13015-017-0115-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient. Results: The methods developed in this paper take sets of paralogs for different proteins found in the same species and make a pairing based on their evolutionary distance relative to a set of other proteins that are unique and so have a known relationship (singletons). The former constitute a set of unlabelled nodes in a graph while the latter are labelled. Two variants were tested, one based on a phylogenetic tree of the sequences (the topology-based method) and a simpler, faster variant based only on the inter-sequence distances (the distance-based method). Over a set of test proteins, both gave good results, with the topology method performing slightly better. Conclusions: The methods develop here still need refinement and augmentation from constraints other than the sequence data alone, such as known interactions from annotation and databases, or non-trivial relationships in genome location. With the ever growing numbers of eukaryotic genomes, it is hoped that the methods described here will open a route to the use of these data equal to the current success attained with bacterial sequences.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs
    Azad, Ariful
    Buluc, Aydin
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 32 - 42
  • [42] (Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs
    Barak, Boaz
    Chou, Chi-Ning
    Lei, Zhixian
    Schramm, Tselil
    Sheng, Yueqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [43] Correction to: Markovian Online Matching Algorithms on Large Bipartite Random Graphs
    Mohamed Habib Aliou Diallo Aoudi
    Pascal Moyal
    Vincent Robin
    Methodology and Computing in Applied Probability, 2022, 24 : 3227 - 3227
  • [44] Efficient Algorithms for Maximum Induced Matching Problem in Permutation and Trapezoid Graphs
    Viet Dung Nguyen
    Ba Thai Pham
    Phan Thuan
    FUNDAMENTA INFORMATICAE, 2021, 182 (03) : 257 - 283
  • [45] COMPUTATIONAL ALGORITHMS FOR MATCHING POLYNOMIALS OF GRAPHS FROM THE CHARACTERISTIC-POLYNOMIALS OF EDGE-WEIGHTED GRAPHS
    HOSOYA, H
    BALASUBRAMANIAN, K
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 1989, 10 (05) : 698 - 710
  • [46] EULER GRAPHS ON LABELLED NODES
    READ, RC
    CANADIAN JOURNAL OF MATHEMATICS, 1962, 14 (03): : 482 - &
  • [47] Labelled packing functions in graphs
    Hinrichsen, Erica G.
    Leoni, Valeria A.
    Safe, Martin D.
    INFORMATION PROCESSING LETTERS, 2020, 154 (154)
  • [48] Enumeration of Labelled Essential Graphs
    Steinsky, Bertran
    ARS COMBINATORIA, 2013, 111 : 485 - 494
  • [49] Efficient coding of labelled graphs
    Almudevar, Anthony
    2007 IEEE INFORMATION THEORY WORKSHOP, VOLS 1 AND 2, 2007, : 523 - 528
  • [50] C*-algebras of labelled graphs
    Bates, Teresa
    Pask, David
    JOURNAL OF OPERATOR THEORY, 2007, 57 (01) : 207 - 226