GTED: Graph Traversal Edit Distance

被引:1
|
作者
Boroojeny, Ali Ebrahimpour [1 ]
Shrestha, Akash [1 ]
Sharifi-Zarchi, Ali [1 ,2 ,3 ]
Gallagher, Suzanne Renick [1 ]
Sahinalp, S. Cenk [4 ]
Chitsaz, Hamidreza [1 ]
机构
[1] Colorado State Univ, Ft Collins, CO 80523 USA
[2] Royan Inst, Tehran, Iran
[3] Sharif Univ Technol, Tehran, Iran
[4] Indiana Univ, Bloomington, IN USA
关键词
STRUCTURAL VARIATION; GENOME;
D O I
10.1007/978-3-319-89929-9_3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space, which renders graph kernels important for the aforementioned applications. In this paper, we give a new graph kernel which we call graph traversal edit distance (GTED). We introduce the GTED problem and give the first polynomial time algorithm for it. Informally, the graph traversal edit distance is the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. Also, GTED is motivated by and provides the first mathematical formalism for sequence co-assembly and de novo variation detection in bioinformatics. We demonstrate that GTED admits a polynomial time algorithm using a linear program in the graph product space that is guaranteed to yield an integer solution. To the best of our knowledge, this is the first approach to this problem. We also give a linear programming relaxation algorithm for a lower bound on GTED. We use GTED as a graph kernel and evaluate it by computing the accuracy of an SVM classifier on a few datasets in the literature. Our results suggest that our kernel outperforms many of the common graph kernels in the tested datasets. As a second set of experiments, we successfully cluster viral genomes using GTED on their assembly graphs obtained from de novo assembly of next generation sequencing reads. Our GTED implementation can be downloaded from http://chitsazlab.org/software/gted/.
引用
收藏
页码:37 / 53
页数:17
相关论文
共 50 条
  • [1] Graph Traversal Edit Distance and Extensions
    Ebrahimpour Boroojeny, Ali
    Shrestha, Akash
    Sharifi-Zarchi, Ali
    Gallagher, Suzanne Renick
    Sahinalp, S. Cenk
    Chitsaz, Hamidreza
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2020, 27 (03) : 317 - 329
  • [2] PyGTED: Python']Python Application for Computing Graph Traversal Edit Distance
    Boroojeny, Ali Ebrahimpour
    Shrestha, Akash
    Sharifi-zarchi, Ali
    Gallagher, Suzanne Renick
    Sahinalp, Suleyman Cenk
    Chitsaz, Hamidreza
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2020, 27 (03) : 436 - 439
  • [3] Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants
    Qiu, Yutong
    Shen, Yihang
    Kingsford, Carl
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2024, 19 (01)
  • [4] Graph Edit Distance or Graph Edit Pseudo-Distance?
    Serratosa, Francesc
    Cortes, Xavier
    Moreno, Carlos-Francisco
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2016, 2016, 10029 : 530 - 540
  • [5] A survey of graph edit distance
    Xinbo Gao
    Bing Xiao
    Dacheng Tao
    Xuelong Li
    Pattern Analysis and Applications, 2010, 13 : 113 - 129
  • [6] Greedy Graph Edit Distance
    Riesen, Kaspar
    Ferrer, Miquel
    Dornberger, Rolf
    Bunke, Horst
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, MLDM 2015, 2015, 9166 : 3 - 16
  • [7] Bayesian graph edit distance
    Myers, R
    Wilson, RC
    Hancock, ER
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (06) : 628 - 635
  • [8] A survey of graph edit distance
    Gao, Xinbo
    Xiao, Bing
    Tao, Dacheng
    Li, Xuelong
    PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (01) : 113 - 129
  • [9] Redefining the Graph Edit Distance
    Serratosa F.
    SN Computer Science, 2021, 2 (6)
  • [10] Graph Edit Distance in the Exact Context
    Darwiche, Mostafa
    Raveaux, Romain
    Conte, Donatello
    T'Kindt, Vincent
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2018, 2018, 11004 : 304 - 314