Comparison of large networks with sub-sampling strategies

被引:10
|
作者
Ali, Waqar [1 ]
Wegner, Anatol E. [1 ]
Gaunt, Robert E. [1 ]
Deane, Charlotte M. [1 ]
Reinert, Gesine [1 ]
机构
[1] Univ Oxford, Dept Stat, 24-29 St Giles, Oxford OX1 3LB, England
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
PROTEIN-INTERACTION NETWORKS; GLOBAL ALIGNMENT; RANDOM GRAPHS; EVOLUTION; DATABASE; MOTIFS;
D O I
10.1038/srep28955
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Comparison of large networks with sub-sampling strategies
    Waqar Ali
    Anatol E. Wegner
    Robert E. Gaunt
    Charlotte M. Deane
    Gesine Reinert
    Scientific Reports, 6
  • [2] A Low-Jitter Sub-Sampling PLL With a Sub-Sampling DLL
    Qian, Yuan Cheng
    Chao, Yen Yu
    Liu, Shen Iuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (02) : 269 - 273
  • [3] PLEXIGLASS SUB-SAMPLING BOX FOR LARGE BENTHOS SAMPLES
    KLATTENBERG, RP
    PROGRESSIVE FISH-CULTURIST, 1975, 37 (03): : 165 - 165
  • [4] Robust Inference by Sub-sampling
    Nasreen Nawaz
    Journal of Quantitative Economics, 2020, 18 : 657 - 681
  • [5] Sub-Sampling Quantize-and-Forward Schemes for Relay Networks
    Zhai, Jing
    Xu, Wenbo
    Niu, Kai
    Wang, Yue
    2014 IEEE 80TH VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2014,
  • [6] Sub-Sampling PLL Techniques
    Gao, Xiang
    Klumperink, Eric
    Nauta, Bram
    2015 IEEE CUSTOM INTEGRATED CIRCUITS CONFERENCE (CICC), 2015,
  • [7] Robust Inference by Sub-sampling
    Nawaz, Nasreen
    JOURNAL OF QUANTITATIVE ECONOMICS, 2020, 18 (03) : 657 - 681
  • [8] Informed sub-sampling MCMC: approximate Bayesian inference for large datasets
    Florian Maire
    Nial Friel
    Pierre Alquier
    Statistics and Computing, 2019, 29 : 449 - 482
  • [9] Sub-sampling graph neural networks for genomic prediction of quantitative phenotypes
    Kihlman, Ragini
    Launonen, Ilkka
    Sillanpaa, Mikko J.
    Waldmann, Patrik
    G3-GENES GENOMES GENETICS, 2024, 14 (11):
  • [10] Informed sub-sampling MCMC: approximate Bayesian inference for large datasets
    Maire, Florian
    Friel, Nial
    Alquier, Pierre
    STATISTICS AND COMPUTING, 2019, 29 (03) : 449 - 482