Comparison of large networks with sub-sampling strategies

被引:10
|
作者
Ali, Waqar [1 ]
Wegner, Anatol E. [1 ]
Gaunt, Robert E. [1 ]
Deane, Charlotte M. [1 ]
Reinert, Gesine [1 ]
机构
[1] Univ Oxford, Dept Stat, 24-29 St Giles, Oxford OX1 3LB, England
来源
SCIENTIFIC REPORTS | 2016年 / 6卷
基金
英国生物技术与生命科学研究理事会; 英国工程与自然科学研究理事会;
关键词
PROTEIN-INTERACTION NETWORKS; GLOBAL ALIGNMENT; RANDOM GRAPHS; EVOLUTION; DATABASE; MOTIFS;
D O I
10.1038/srep28955
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Networks are routinely used to represent large data sets, making the comparison of networks a tantalizing research question in many areas. Techniques for such analysis vary from simply comparing network summary statistics to sophisticated but computationally expensive alignment-based approaches. Most existing methods either do not generalize well to different types of networks or do not provide a quantitative similarity score between networks. In contrast, alignment-free topology based network similarity scores empower us to analyse large sets of networks containing different types and sizes of data. Netdis is such a score that defines network similarity through the counts of small sub-graphs in the local neighbourhood of all nodes. Here, we introduce a sub-sampling procedure based on neighbourhoods which links naturally with the framework of network comparisons through local neighbourhood comparisons. Our theoretical arguments justify basing the Netdis statistic on a sample of similar-sized neighbourhoods. Our tests on empirical and synthetic datasets indicate that often only 10% of the neighbourhoods of a network suffice for optimal performance, leading to a drastic reduction in computational requirements. The sampling procedure is applicable even when only a small sample of the network is known, and thus provides a novel tool for network comparison of very large and potentially incomplete datasets.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Sub-Sampling Framework of Distributed Video Coding
    Xu, Wenbo
    He, Zhiqiang
    Niu, Kai
    Lin, Jiaru
    2010 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, 2010, : 1145 - 1148
  • [22] NOTE ON A THEORETICAL SUB-SAMPLING DISTRIBUTION OF MACROPLANKTON
    HORWOOD, JW
    DRIVER, RM
    JOURNAL DU CONSEIL, 1976, 36 (03): : 274 - 276
  • [23] INTRA-PREDICTION WITH ADAPTIVE SUB-SAMPLING
    Tan, Yih Han
    Yeo, Chuohao
    Li, Zhengguo
    Rahardja, Susanto
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [24] A SYSTEMATIC PROCEDURE FOR SUB-SAMPLING PLANKTON SAMPLES
    KOUTSIKOPOULOS, C
    PETITGAS, P
    OCEANOLOGICA ACTA, 1990, 13 (03) : 403 - 409
  • [25] Implementation considerations for a sub-sampling impulse radio
    Chen, Mike Shuo-Wei
    Brodersen, Robert W.
    2006 IEEE INTERNATIONAL CONFERENCE ON ULTRA-WIDEBAND, VOLS 1 AND 2, 2006, : 345 - +
  • [26] Evaluation of sample processing and sub-sampling performance
    da Silva, RJNB
    Figueiredo, H
    Santos, JR
    Camoes, MFGFC
    ANALYTICA CHIMICA ACTA, 2003, 477 (02) : 169 - 185
  • [27] Sub-Sampling Framework Comparison for Low-Power Data Gathering: A Comparative Analysis
    Milosevic, Bojan
    Caione, Carlo
    Farella, Elisabetta
    Brunelli, Davide
    Benini, Luca
    SENSORS, 2015, 15 (03) : 5058 - 5080
  • [29] ESTIMATION FOR SUB-SAMPLING DESIGNS EMPLOYING THE COUNTY AS A PRIMARY SAMPLING UNIT
    JEBE, EH
    ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (01): : 134 - 134
  • [30] Sub-sampling and preparing forensic samples for pollen analysis
    Horrocks, M
    JOURNAL OF FORENSIC SCIENCES, 2004, 49 (05) : 1024 - 1027