Distribution-free data density estimation in large-scale networks

被引:0
|
作者
Minqi Zhou
Rong Zhang
Weining Qian
Aoying Zhou
机构
[1] East China Normal University,Data Science and Engineering Institute
[2] Wuhan University,State Key Lab of Software Engineering
来源
关键词
distribution-free; data density estimation; random sampling;
D O I
暂无
中图分类号
学科分类号
摘要
Estimating the global data distribution in large-scale networks is an important issue and yet to be well addressed. It can benefit many applications, especially in the cloud computing era, such as load balancing analysis, query processing, and data mining. Inspired by the inversion method for random variate (number) generation, in this paper, we present a novel model called distribution-free data density estimation for large ring-based networks to achieve high estimation accuracy with low estimation cost regardless of the distribution models of the underlying data. This model generates random samples for any arbitrary distribution by sampling the global cumulative distribution function and is free from sampling bias. Armed with this estimation method, we can estimate data densities over both one-dimensional and multidimensional tuple sets, where each dimension could be either continuous or discrete as its domain. In large-scale networks, the key idea for distribution-free estimation is to sample a small subset of peers for estimating the global data distribution over the data domain. Algorithms on computing and sampling the global cumulative distribution function based on which the global data distribution is estimated are introduced with a detailed theoretical analysis. Our extensive performance study confirms the effectiveness and efficiency of our methods in large ring-based networks.
引用
收藏
页码:1220 / 1240
页数:20
相关论文
共 50 条
  • [11] A DISTRIBUTION-FREE METHOD OF INTERVAL ESTIMATION FOR UNSIGNED DIRECTIONAL-DATA
    PRENTICE, MJ
    BIOMETRIKA, 1984, 71 (01) : 147 - 154
  • [12] Analysis of Smart Meter Data and Impacts on Large-scale Power Distribution Networks
    Bajagain, Surendra
    Dubey, Anamika
    2023 IEEE BELGRADE POWERTECH, 2023,
  • [13] Visualization of density relations in large-scale neural networks
    Z. Nadasdy
    L. Zaborszky
    Anatomy and Embryology, 2001, 204 : 303 - 317
  • [14] Visualization of density relations in large-scale neural networks
    Nadasdy, Z
    Zaborszky, L
    ANATOMY AND EMBRYOLOGY, 2001, 204 (04): : 303 - 317
  • [15] Distribution-free tests for vague data
    Grzegorzewski, P
    SOFT METHODOLOGY AND RANDOM INFORMATION SYSTEMS, 2004, : 495 - 502
  • [16] Estimation of travel time reliability in large-scale networks
    Babaei, Mohsen
    Rajabi-Bahaabadi, Mojtaba
    Shariat-Mohaymany, Afshin
    TRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH, 2016, 8 (04): : 229 - 240
  • [17] Distribution-free inference with hierarchical data
    Lee, Yonghoon
    Barber, Rina Foygel
    Willett, Rebecca
    arXiv, 2023,
  • [18] PROBABILITY DENSITY DISTRIBUTION OF LARGE-SCALE ATMOSPHERIC FLOW
    SUTERA, A
    ADVANCES IN GEOPHYSICS, 1986, 29 : 227 - 249
  • [19] DISTRIBUTION-FREE ESTIMATION IN A DISEQUILIBRIUM MARKET MODEL
    SAPRA, SK
    ECONOMICS LETTERS, 1986, 22 (01) : 39 - 43
  • [20] Distribution-Free Location-Scale Regression
    Siegfried, Sandra
    Kook, Lucas
    Hothorn, Torsten
    AMERICAN STATISTICIAN, 2023, 77 (04): : 345 - 356