Topology-based sparsification of graph annotations

被引:2
|
作者
Danciu, Daniel [1 ,2 ]
Karasikov, Mikhail [1 ,2 ,3 ]
Mustafa, Harun [1 ,2 ,3 ]
Kahles, Andre [1 ,2 ,3 ]
Raetsch, Gunnar [1 ,2 ,3 ,4 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Biomed Informat Grp, Zurich, Switzerland
[2] Univ Hosp Zurich, Biomed Informat Res, Zurich, Switzerland
[3] Swiss Inst Bioinformat, Zurich, Switzerland
[4] Swiss Fed Inst Technol, Dept Biol, Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btab330
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Since the amount of published biological sequencing data is growing exponentially, efficient methods for storing and indexing this data are more needed than ever to truly benefit from this invaluable resource for biomedical research. Labeled de Bruijn graphs are a frequently-used approach for representing large sets of sequencing data. While significant progress has been made to succinctly represent the graph itself, efficient methods for storing labels on such graphs are still rapidly evolving. Results: In this article, we present RowDiff, a new technique for compacting graph labels by leveraging expected similarities in annotations of vertices adjacent in the graph. RowDiff can be constructed in linear time relative to the number of vertices and labels in the graph, and in space proportional to the graph size. In addition, construction can be efficiently parallelized and distributed, making the technique applicable to graphs with trillions of nodes. RowDiff can be viewed as an intermediary sparsification step of the original annotation matrix and can thus naturally be combined with existing generic schemes for compressed binary matrices. Experiments on 10 000 RNA-seq datasets show that RowDiff combined with multi-BRWT results in a 30% reduction in annotation footprint over Mantis-MST, the previously known most compact annotation representation. Experiments on the sparser Fungi subset of the RefSeq collection show that applying RowDiff sparsification reduces the size of individual annotation columns stored as compressed bit vectors by an average factor of 42. When combining RowDiff with a multi-BRWT representation, the resulting annotation is 26 times smaller than Mantis-MST. Availability and implementation: RowDiff is implemented in C++ within the MetaGraph framework. The source code and the data used in the experiments are publicly available at https://github.com/ratschlab/row_diff.
引用
收藏
页码:I169 / I176
页数:8
相关论文
共 50 条
  • [21] Topology-Based Quantitative Assessment of Structural Robustness
    高扬
    刘西拉
    Journal of Shanghai Jiaotong University(Science), 2014, 19 (03) : 257 - 264
  • [22] TopoAngler: Interactive Topology-based Extraction of Fishes
    Bock, Alexander
    Doraiswamy, Harish
    Summers, Adam
    Silva, Claudio
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (01) : 812 - 821
  • [23] A topology-based algorithm for directed network alignment
    Liu, F. (liufu@jlu.edu.cn), 1600, Universitas Ahmad Dahlan, Jalan Kapas 9, Semaki, Umbul Harjo,, Yogiakarta, 55165, Indonesia (11):
  • [24] Topology-based generation of sport training sessions
    Fister, Iztok, Jr.
    Fister, Dusan
    Fister, Iztok
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (01) : 667 - 678
  • [25] TOPOLOGY-BASED INTERLOCKING OF ELECTRICAL SUBSTATIONS.
    Brand, K.P.
    Kopainsky, J.
    Wimmer, W.
    IEEE Transactions on Power Delivery, 1986, PWRD-1 (03): : 118 - 126
  • [26] DEPICT: A topology-based debugger for MPI programs
    Huband, S
    McDonald, C
    HIGH-LEVEL PARALLEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, PROCEEDINGS, 2001, 2026 : 109 - 121
  • [27] Augmenting topology-based maps with geometric information
    Fabrizi, E
    Saffiotti, A
    INTELLIGENT AUTONOMOUS SYSTEMS 6, 2000, : 604 - 611
  • [28] TOSS: A Topology-based Scheduler for Storm Clusters
    Zhou, Yi
    Liu, Yangyang
    Zhang, Chaowei
    Peng, Xiaopu
    Oin, Xiao
    2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 587 - 596
  • [29] Topology-based classification of tetrads and quadruplex structures
    Popenda, Mariusz
    Miskiewicz, Joanna
    Sarzynska, Joanna
    Zok, Tomasz
    Szachniuk, Marta
    BIOINFORMATICS, 2020, 36 (04) : 1129 - 1134
  • [30] A fuzzy topology-based maximum likelihood classification
    Liu, Kimfung
    Shi, Wenzhong
    Zhang, Hua
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2011, 66 (01) : 103 - 114