GTED: Graph Traversal Edit Distance

被引:1
|
作者
Boroojeny, Ali Ebrahimpour [1 ]
Shrestha, Akash [1 ]
Sharifi-Zarchi, Ali [1 ,2 ,3 ]
Gallagher, Suzanne Renick [1 ]
Sahinalp, S. Cenk [4 ]
Chitsaz, Hamidreza [1 ]
机构
[1] Colorado State Univ, Ft Collins, CO 80523 USA
[2] Royan Inst, Tehran, Iran
[3] Sharif Univ Technol, Tehran, Iran
[4] Indiana Univ, Bloomington, IN USA
关键词
STRUCTURAL VARIATION; GENOME;
D O I
10.1007/978-3-319-89929-9_3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this paper, we give a new graph kernel which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the graph traversal edit distance is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of an SVM classifier on a few datasets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested datasets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next generation sequencing reads. Our GTED implementation can be downloaded from http://chitsazlab.org/software/gted/.
引用
收藏
页码:37 / 53
页数:17
相关论文
共 50 条
  • [21] The Reeb Graph Edit Distance is Universal
    Ulrich Bauer
    Claudia Landi
    Facundo Mémoli
    Foundations of Computational Mathematics, 2021, 21 : 1441 - 1464
  • [22] On the exact computation of the graph edit distance
    Blumenthal, David B.
    Gamper, Johann
    Pattern Recognition Letters, 2020, 134 : 46 - 57
  • [23] Learning edit cost estimation models for graph edit distance
    Cortes, Xavier
    Conte, Donatello
    Cardot, Hubert
    PATTERN RECOGNITION LETTERS, 2019, 125 : 256 - 263
  • [24] Convex graph invariant relaxations for graph edit distance
    Candogan, Utkan Onur
    Chandrasekaran, Venkat
    MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 595 - 629
  • [25] Convex graph invariant relaxations for graph edit distance
    Utkan Onur Candogan
    Venkat Chandrasekaran
    Mathematical Programming, 2022, 191 : 595 - 629
  • [26] Learning graph edit distance by graph neural networks
    Riba, Pau
    Fischer, Andreas
    Llados, Josep
    Fornes, Alicia
    PATTERN RECOGNITION, 2021, 120
  • [27] On the unification of the graph edit distance and graph matching problems
    Raveaux, Romain
    PATTERN RECOGNITION LETTERS, 2021, 145 : 240 - 246
  • [28] A Novel Approach to Cluster Web Traversal Patterns Based on Edit Distance
    Tan, Xiaoqiu
    Xu, Miaojun
    EMERGING RESEARCH IN WEB INFORMATION SYSTEMS AND MINING, 2011, 238 : 440 - 447
  • [29] Approximate Graph Edit Distance in Quadratic Time
    Riesen, Kaspar
    Ferrer, Miquel
    Bunke, Horst
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (02) : 483 - 494
  • [30] A Study on the Stability of Graph Edit Distance Heuristics
    Jia, Linlin
    Tognetti, Vincent
    Joubert, Laurent
    Gauzere, Benoit
    Honeine, Paul
    ELECTRONICS, 2022, 11 (20)