AN EFFICIENT ALGORITHM FOR CHINESE POSTMAN WALK ON BI-DIRECTED DE BRUIJN GRAPHS

被引:2
|
作者
Kundeti, Vamsi [1 ]
Rajasekaran, Sanguthevar [1 ]
Dinh, Heiu [1 ]
机构
[1] Univ Connecticut, Dept Comp Sci & Engn, Storrs, CT 06269 USA
基金
美国国家科学基金会;
关键词
Sequence assembly algorithms; bioinformatics; Chinese Postman problem; bi-directed graphs;
D O I
10.1142/S179383091250019X
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Sequence assembly from short reads is an important problem in biology. It is known that solving the sequence assembly problem exactly on a bi-directed de Bruijn graph or a string graph is intractable. However, finding a shortest double stranded DNA string (SDDNA) containing all the k-long words in the reads seems to be a good heuristic to get close to the original genome. This problem is equivalent to finding a cyclic Chinese Postman (CP) walk on the underlying unweightedbi- directed de Bruijn graph built from the reads. The Chinese Postman walk Problem (CPP) is solved by reducing it to a general bi-directed flow on this graph which runs in O(|E| (2)log(2)(|V |)) time. In this paper we show that the cyclic CPP on bi-directed graphs can be solved without reducing it to bi-directed flow. We present a Xi(p(|V |+| E|) log(|V |)+(dmaxp)(3)) time algorithm to solve the cyclic CPP on a weighted bi-directed de Bruijn graph, where p = max{|{v| d(in)(v) - d(out)( v) > 0}|, |{v| d(in)(v) - d(out)(v) < 0}|} and d(max) = max{| d(in)(v) - d(out)(v)}. Our algorithm performs asymptotically better than the bi-directed flow algorithm when the number of imbalanced nodes p is much less than the nodes in the bi-directed graph. From our experimental results on various datasets, we have noticed that the value of p/|V | lies between 0.08% and 0.13% with 95% probability. Many practical bi-directed de Bruijn graphs do not have cyclic CP walks. In such cases it is not clear how the bi-directed flow can be useful in identifying contigs. Our algorithm can handle such situations and identify maximal bi-directed sub-graphs that have CP walks. A Xi(p(|V | + | E|)) time heuristic algorithm based on these ideas has been implemented for the SDDNA problem. This algorithm was tested on short reads from a plant genome and achieves an approximation ratio of at most 1.0134. We also present a Xi((|V | + | E|) log(V)) time algorithm for the single source shortest path problem on bi-directed de Bruijn graphs, which may be of independent interest.
引用
收藏
页数:16
相关论文
共 34 条
  • [21] A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
    Sze, Sing-Hoi
    Tarone, Aaron M.
    [J]. BMC GENOMICS, 2014, 15
  • [22] HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly
    Limon, Mahfuzer Rahman
    Sharker, Ratul
    Biswas, Sajib
    Rahman, M. Sohel
    [J]. INTERNATIONAL JOURNAL OF GENOMICS, 2017, 2017
  • [23] A Linear-Time Complexity Algorithm for Solving the Dyck-CFL Reachability Problem on Bi-directed Trees
    Sun Xiaoshan
    Zhang Yang
    Cheng Liang
    [J]. FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): COMPUTER VISION, IMAGE ANALYSIS AND PROCESSING, 2013, 8783
  • [24] Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
    Mukherjee, Kingshuk
    Rossi, Massimiliano
    Salmela, Leena
    Boucher, Christina
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2021, 16 (01)
  • [25] Fast and efficient Rmap assembly using the Bi-labelled de Bruijn graph
    Kingshuk Mukherjee
    Massimiliano Rossi
    Leena Salmela
    Christina Boucher
    [J]. Algorithms for Molecular Biology, 16
  • [26] DyBED: An Efficient Algorithm for Updating Betweenness Centrality in Directed Dynamic Graphs
    Chehreghani, Mostafa Haghir
    Bifet, Albert
    Abdessalem, Talel
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2114 - 2123
  • [27] TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes
    Minkin, Ilia
    Pham, Son
    Medvedev, Paul
    [J]. BIOINFORMATICS, 2017, 33 (24) : 4024 - 4032
  • [28] An Efficient GPU-based de Bruijn Graph Construction Algorithm for Micro-Assembly
    Ren, Shanshan
    Ahmed, Nauman
    Bertels, Koen
    Al-Ars, Zaid
    [J]. PROCEEDINGS 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2018, : 67 - 72
  • [29] AN EFFICIENT PARALLEL ALGORITHM FOR FINDING HAMILTONIAN CYCLES IN DENSE DIRECTED-GRAPHS
    FURER, M
    RAGHAVACHARI, B
    [J]. JOURNAL OF ALGORITHMS, 1995, 18 (02) : 203 - 220
  • [30] AN EFFICIENT NC ALGORITHM FOR FINDING HAMILTONIAN CYCLES IN DENSE DIRECTED-GRAPHS
    FURER, M
    RAGHAVACHARI, B
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 510 : 429 - 440