Algorithms for matching partially labelled sequence graphs

被引：0

作者：

Taylor, William R. ^{[1
]}

机构：

[1] Francis Crick Inst, 1 Midland Rd, London NW1 1AT, England

来源：

ALGORITHMS FOR MOLECULAR BIOLOGY | 2017年 / 12卷

基金：

英国惠康基金;

关键词：

Phylogenetic tree matching; Correlated substitution analysis; Bipartite graph matching; PROTEIN; CONTACTS; COEVOLUTION; PREDICTION;

D O I：

10.1186/s13015-017-0115-y

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: In order to find correlated pairs of positions between proteins, which are useful in predicting interactions, it is necessary to concatenate two large multiple sequence alignments such that the sequences that are joined together belong to those that interact in their species of origin. When each protein is unique then the species name is sufficient to guide this match, however, when there are multiple related sequences (paralogs) in each species then the pairing is more difficult. In bacteria a good guide can be gained from genome co-location as interacting proteins tend to be in a common operon but in eukaryotes this simple principle is not sufficient. Results: The methods developed in this paper take sets of paralogs for different proteins found in the same species and make a pairing based on their evolutionary distance relative to a set of other proteins that are unique and so have a known relationship (singletons). The former constitute a set of unlabelled nodes in a graph while the latter are labelled. Two variants were tested, one based on a phylogenetic tree of the sequences (the topology-based method) and a simpler, faster variant based only on the inter-sequence distances (the distance-based method). Over a set of test proteins, both gave good results, with the topology method performing slightly better. Conclusions: The methods develop here still need refinement and augmentation from constraints other than the sequence data alone, such as known interactions from annotation and databases, or non-trivial relationships in genome location. With the ever growing numbers of eukaryotic genomes, it is hoped that the methods described here will open a route to the use of these data equal to the current success attained with bacterial sequences.

引用

页数：22

共 50 条

[41] Distributed-Memory Algorithms for Maximum Cardinality Matching in Bipartite Graphs
Azad, Ariful
Buluc, Aydin
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 32 - 42
[42] (Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs
Barak, Boaz
Chou, Chi-Ning
Lei, Zhixian
Schramm, Tselil
Sheng, Yueqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[43] Correction to: Markovian Online Matching Algorithms on Large Bipartite Random Graphs
Mohamed Habib Aliou Diallo Aoudi
Pascal Moyal
Vincent Robin
Methodology and Computing in Applied Probability, 2022, 24 : 3227 - 3227
[44] Efficient Algorithms for Maximum Induced Matching Problem in Permutation and Trapezoid Graphs
Viet Dung Nguyen
Ba Thai Pham
Phan Thuan
FUNDAMENTA INFORMATICAE, 2021, 182 (03) : 257 - 283
[45] COMPUTATIONAL ALGORITHMS FOR MATCHING POLYNOMIALS OF GRAPHS FROM THE CHARACTERISTIC-POLYNOMIALS OF EDGE-WEIGHTED GRAPHS
HOSOYA, H
BALASUBRAMANIAN, K
JOURNAL OF COMPUTATIONAL CHEMISTRY, 1989, 10 (05) : 698 - 710
[46] EULER GRAPHS ON LABELLED NODES
READ, RC
CANADIAN JOURNAL OF MATHEMATICS, 1962, 14 (03): : 482 - &
[47] Labelled packing functions in graphs
Hinrichsen, Erica G.
Leoni, Valeria A.
Safe, Martin D.
INFORMATION PROCESSING LETTERS, 2020, 154 (154)
[48] Enumeration of Labelled Essential Graphs
Steinsky, Bertran
ARS COMBINATORIA, 2013, 111 : 485 - 494
[49] Efficient coding of labelled graphs
Almudevar, Anthony
2007 IEEE INFORMATION THEORY WORKSHOP, VOLS 1 AND 2, 2007, : 523 - 528
[50] C*-algebras of labelled graphs
Bates, Teresa
Pask, David
JOURNAL OF OPERATOR THEORY, 2007, 57 (01) : 207 - 226

← 1 2 3 4 5 →