Topology-based sparsification of graph annotations

被引:2
|
作者
Danciu, Daniel [1 ,2 ]
Karasikov, Mikhail [1 ,2 ,3 ]
Mustafa, Harun [1 ,2 ,3 ]
Kahles, Andre [1 ,2 ,3 ]
Raetsch, Gunnar [1 ,2 ,3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Biomed Informat Grp, Zurich, Switzerland
[2] Univ Hosp Zurich, Biomed Informat Res, Zurich, Switzerland
[3] Swiss Inst Bioinformat, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Biol, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btab330
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results: In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. Availability and implementation: RowDiff is implemented in C++ within the MetaGraph framework. The source code and the data used in the experiments are publicly available at https://github.com/ratschlab/row_diff.
引用
收藏
页码:I169 / I176
页数:8
相关论文
共 50 条
  • [1] Topology-Based Spectral Sparsification
    Meidiana, Amyra
    Hong, Seok-Hee
    Huang, Jiajun
    Eades, Peter
    Ma, Kwan-Liu
    2019 IEEE 9TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2019, : 73 - 82
  • [2] An ad hoc topology-based graph signal sampling
    Saeedi-Sourck, Hamid
    Mangeli, Elahe
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [3] Topology-Based Exact Synthesis for Majority Inverter Graph
    Ge, Xianliang
    Kimura, Shinji
    2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 3255 - 3259
  • [4] Jerboa: A Graph Transformation Library for Topology-Based Geometric Modeling
    Belhaouari, Hakim
    Arnould, Agnes
    Le Gall, Pascale
    Bellet, Thomas
    GRAPH TRANSFORMATION, 2014, 8571 : 269 - 284
  • [5] A Topology-Based Approach to Pattern Recognition on Graph-Structured Data
    Chen, Jun
    Chen, Haopeng
    2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS, 2018, : 454 - 461
  • [6] A topology-based graph data model for indoor spatial-social networking
    Rahimi, Mahdi
    Malek, Mohammad Reza
    Claramunt, Christophe
    Le Pors, Thierry
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2021, 35 (12) : 2517 - 2539
  • [7] Topology-Based Clustering Techniques for Graph Partitioning Applied to the Italian Transmission Network
    Pomarico, Andrea
    Berizzi, Alberto
    Maria Giannuzzi, Giorgio
    Pisani, Cosimo
    IEEE ACCESS, 2024, 12 : 84005 - 84019
  • [8] Topology-based physical simulation
    University of Poitiers, XLIM-SIC CNRS UMR 6172, France
    VRIPHYS - Workshop Virtual Real. Interact. Phys. Simul., (1-10):
  • [9] Topology-based stereochemistry representation
    Dietz, A
    Fiorio, C
    Habib, M
    Laurenco, C
    COMPTES RENDUS DE L ACADEMIE DES SCIENCES SERIE II FASCICULE C-CHIMIE, 1998, 1 (02): : 95 - 100
  • [10] Topology-based signal separation
    Robins, V
    Rooney, N
    Bradley, E
    CHAOS, 2004, 14 (02) : 305 - 316