Topology-based sparsification of graph annotations

被引:2
|
作者
Danciu, Daniel [1 ,2 ]
Karasikov, Mikhail [1 ,2 ,3 ]
Mustafa, Harun [1 ,2 ,3 ]
Kahles, Andre [1 ,2 ,3 ]
Raetsch, Gunnar [1 ,2 ,3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Biomed Informat Grp, Zurich, Switzerland
[2] Univ Hosp Zurich, Biomed Informat Res, Zurich, Switzerland
[3] Swiss Inst Bioinformat, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Biol, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btab330
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results: In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. Availability and implementation: RowDiff is implemented in C++ within the MetaGraph framework. The source code and the data used in the experiments are publicly available at https://github.com/ratschlab/row_diff.
引用
收藏
页码:I169 / I176
页数:8
相关论文
共 50 条
  • [31] General topology-based mesh data structure
    Rensselaer Polytechnic Inst, Troy, United States
    Int J Numer Methods Eng, 9 (1573-1596):
  • [32] Topology-based quantitative assessment of structural robustness
    Gao Y.
    Liu X.-L.
    Journal of Shanghai Jiaotong University (Science), 1600, Shanghai Jiaotong University (19): : 257 - 264
  • [33] Uncertain Graph Sparsification
    Parchas, Panos
    Papailiou, Nikolaos
    Papadias, Dimitris
    Bonchi, Francesco
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 2141 - 2142
  • [34] Flow Topology-Based Graph Convolutional Network for Intrusion Detection in Label-Limited IoT Networks
    Deng, Xiaoheng
    Zhu, Jincai
    Pei, Xinjun
    Zhang, Lan
    Ling, Zhen
    Xue, Kaiping
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (01): : 684 - 696
  • [35] Suade: Topology-Based Searches for Software Investigation
    Warr, Frederic Weigand
    Robillard, Martin P.
    ICSE 2007: 29TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2007, : 780 - +
  • [36] Topology-based Workflow Scheduling in Commercial Clouds
    Ji, Haoran
    Bao, Weidong
    Zhu, Xiaomin
    Xiao, Wenhua
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2015, 9 (11): : 4311 - 4330
  • [37] Topology-Based Controllability Problem in Network Systems
    Haghighi, Reze
    Cheah, Chien Chern
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2017, 47 (11): : 3077 - 3088
  • [38] A TOPOLOGY-BASED ALGORITHM FOR TRACKING NETWORK CONNECTIVITY
    YEHSAKUL, PD
    DABBAGHCHI, I
    IEEE TRANSACTIONS ON POWER SYSTEMS, 1995, 10 (01) : 339 - 346
  • [39] Computational algebraic topology-based video restoration
    Rochel, A
    Ziou, D
    Auclair-Fortier, F
    IMAGE AND VIDEO COMMUNICATIONS AND PROCESSING 2005, PTS 1 AND 2, 2005, 5685 : 821 - 832
  • [40] Topology-based flow visualization, the state of the art
    Laramee, Robert S.
    Hauser, Helwig
    Zhao, Lingxiao
    Post, Frits H.
    TOPOLOGY-BASED METHODS IN VISUALIZATION, 2007, : 1 - +