Topology-based sparsification of graph annotations

被引:2
|
作者
Danciu, Daniel [1 ,2 ]
Karasikov, Mikhail [1 ,2 ,3 ]
Mustafa, Harun [1 ,2 ,3 ]
Kahles, Andre [1 ,2 ,3 ]
Raetsch, Gunnar [1 ,2 ,3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Biomed Informat Grp, Zurich, Switzerland
[2] Univ Hosp Zurich, Biomed Informat Res, Zurich, Switzerland
[3] Swiss Inst Bioinformat, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Biol, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btab330
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results: In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. Availability and implementation: RowDiff is implemented in C++ within the MetaGraph framework. The source code and the data used in the experiments are publicly available at https://github.com/ratschlab/row_diff.
引用
收藏
页码:I169 / I176
页数:8
相关论文
共 50 条
  • [41] A topology-based scaling mechanism for Apache Storm
    Shieh, Ce-Kuen
    Huang, Sheng-Wei
    Sun, Li-Da
    Tsai, Ming-Fong
    Chilamkurti, Naveen
    INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, 2017, 27 (03)
  • [42] Topology-based geometric reasoning for parametric design
    Feng, L
    Ye, SH
    FIFTH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN & COMPUTER GRAPHICS, VOLS 1 AND 2, 1997, : 407 - 410
  • [43] Topology-based detection of anomalous BGP messages
    Kruegel, C
    Mutz, D
    Robertson, W
    Valeur, F
    RECENT ADVANCES IN INTRUSION DETECTION, PROCEEDINGS, 2003, 2820 : 17 - 35
  • [44] A general topology-based mesh data structure
    Beall, MW
    Shephard, MS
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 1997, 40 (09) : 1573 - 1596
  • [45] Augmenting topology-based maps with geometric information
    Fabrizi, E
    Saffiotti, A
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2002, 40 (2-3) : 91 - 97
  • [46] Uncertain Graph Sparsification
    Parchas, Panos
    Papailiou, Nikolaos
    Papadias, Dimitris
    Bonchi, Francesco
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (12) : 2435 - 2449
  • [47] Graph sparsification with graph convolutional networks
    Jiayu Li
    Tianyun Zhang
    Hao Tian
    Shengmin Jin
    Makan Fardad
    Reza Zafarani
    International Journal of Data Science and Analytics, 2022, 13 : 33 - 46
  • [48] Graph sparsification with graph convolutional networks
    Li, Jiayu
    Zhang, Tianyun
    Tian, Hao
    Jin, Shengmin
    Fardad, Makan
    Zafarani, Reza
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2022, 13 (01) : 33 - 46
  • [49] Topology-based image segmentation using LBP pyramids
    Cerman, Martin
    Janusch, Ines
    Gonzalez-Diaz, Rocio
    Kropatsch, Walter G.
    MACHINE VISION AND APPLICATIONS, 2016, 27 (08) : 1161 - 1174
  • [50] Topology-based hexahedral regular meshing for wave propagation
    Fousse, A
    Bertrand, Y
    Rodrigues, D
    VISION GEOMETRY IX, 2000, 4117 : 155 - 165