Estimating dominance norms of multiple data streams

被引:0
|
作者
Cormode, G [1 ]
Muthukrishnan, S
机构
[1] Rutgers State Univ, Ctr Discrete Math & Comp Sci, Piscataway, NJ 08855 USA
[2] Rutgers State Univ, Div Comp Sci, Piscataway, NJ 08855 USA
来源
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, a(i,j)) where i's correspond to the domain, j's index the different signals and a(i,j) > 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as Sigma(i) max(j) {a(i,j)}. It may be thought as estimating the norm of the "upper envelope" of the multiple signals, or alternatively, as estimating the norm of the "marginal" distribution of tabular data streams. It is used in applications to estimate the "worst case influence" of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (Sigma(i) min(j) {a(ij)}), count-dominance (\{i\a(i) > bi}\) orrelative-dominance (Sigma(i) a(i)/max{1,b(i)}) -are all impossible to estimate accurately with sublinear space.
引用
收藏
页码:148 / 160
页数:13
相关论文
共 50 条
  • [1] ON APPROXIMATING MATRIX NORMS IN DATA STREAMS
    Li, Yi
    Nguyen, Huy L.
    Woodruff, David P.
    [J]. SIAM JOURNAL ON COMPUTING, 2019, 48 (06) : 1643 - 1697
  • [2] Estimating missing data in data streams
    Jiang, Nan
    Gruenwald, Le
    [J]. ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 981 - +
  • [3] Estimating clustering indexes in data streams
    Buriol, Luciana S.
    Frahling, Gereon
    Leonardi, Stefano
    Sohler, Christian
    [J]. ALGORITHMS - ESA 2007, PROCEEDINGS, 2007, 4698 : 618 - +
  • [4] Estimating Mutual Information on Data Streams
    Keller, Fabian
    Mueller, Emmanuel
    Boehm, Klemens
    [J]. PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2015,
  • [5] Estimating Multilevel Models on Data Streams
    Ippel, L.
    Kaptein, M. C.
    Vermunt, J. K.
    [J]. PSYCHOMETRIKA, 2019, 84 (01) : 41 - 64
  • [6] On estimating frequency moments of data streams
    Ganguly, Sumit
    Cormode, Graham
    [J]. APPROXIMATION, RANDOMIZATION, AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES, 2007, 4627 : 479 - +
  • [7] Estimating Multilevel Models on Data Streams
    L. Ippel
    M. C. Kaptein
    J. K. Vermunt
    [J]. Psychometrika, 2019, 84 : 41 - 64
  • [8] Estimating entropy over data streams
    Bhuvanagiri, Lakshminath
    Canguly, Sumit
    [J]. ALGORITHMS - ESA 2006, PROCEEDINGS, 2006, 4168 : 148 - 159
  • [9] Clustering Multiple Data Streams
    Balzanella, Antonio
    Lechevallier, Yves
    Verde, Rosanna
    [J]. NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 247 - 254
  • [10] Estimating entropy and entropy norm on data streams
    Chakrabarti, A
    Do Ba, K
    Muthukrishnan, S
    [J]. STACS 2006, PROCEEDINGS, 2006, 3884 : 196 - 205