Multi-dimensional substring selectivity estimation

被引:0
|
作者
Jagadish, HV [1 ]
Kapitskaia, O [1 ]
Ng, RT [1 ]
Srivastava, D [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. Effective query optimization in this context requires good selectivity estimates. In this paper; we use multi-dimensional count-suffix trees as the basic framework for substring selectivity estimation. Given the enormous size of these trees for large databases, we develop a space and time efficient probabilistic algorithm to construct multi-dimensional pruned count-suffix trees directly. We then present two techniques to obtain good estimates for a given multi-dimensional substring matching query, using a pruned count-suffix tree. The first one, called GNO (for Greedy Non Overlap), generalizes the greedy parsing suggested by Krishnan et al. [9] for one-dimensional substring selectivity estimation. The second one, called MO (for Maximal Overlap), uses all maximal multi-dimensional substrings of the query for estimation; these multi-dimensional substrings help to capture the correlation that may exist between strings in the multiple dimensions. We demonstrate experimentally, using real data sets, that MO is substantially superior to GNO in the quality of the estimate.
引用
收藏
页码:387 / 398
页数:8
相关论文
共 50 条
  • [1] One-dimensional and multi-dimensional substring selectivity estimation
    Jagadish, HV
    Kapitskaia, O
    Ng, RT
    Srivastava, D
    [J]. VLDB JOURNAL, 2000, 9 (03): : 214 - 230
  • [2] One-dimensional and multi-dimensional substring selectivity estimation
    H.V. Jagadish
    Olga Kapitskaia
    Raymond T. Ng
    Divesh Srivastava
    [J]. The VLDB Journal, 2000, 9 : 214 - 230
  • [3] Proactive and reactive multi-dimensional histogram maintenance for selectivity estimation
    He, Zhen
    Lee, Byung Suk
    Wang, X. Sean
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2008, 81 (03) : 414 - 430
  • [4] Multi-dimensional selectivity estimation using compressed histogram information
    Lee, JH
    Kim, DH
    Chung, CW
    [J]. SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 205 - 214
  • [5] Generalized substring selectivity estimation
    Chen, ZY
    Korn, F
    Koudas, N
    Muthukrishnan, S
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (01) : 98 - 132
  • [6] MULTI-DIMENSIONAL PARAMETRIC SPECTRAL ESTIMATION
    NIKIAS, CL
    RAGHUVEER, MR
    [J]. SIGNAL PROCESSING, 1985, 9 (03) : 191 - 205
  • [7] Multi-dimensional function approximation and regression estimation
    Pérez-Cruz, F
    Camps-Valls, G
    Soria-Olivas, E
    Pérez-Ruixo, JJ
    Figueiras-Vidal, AR
    Artés-Rodríguez, A
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 757 - 762
  • [8] Range estimation via multi-dimensional segmentation
    Parker, B
    [J]. 2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 1, PROCEEDINGS, 2003, : 993 - 996
  • [9] Estimation of multiple orientations in multi-dimensional signals
    Mota, C
    Aach, T
    Stuke, I
    Barth, E
    [J]. ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2665 - 2668
  • [10] Speckle tracking for multi-dimensional flow estimation
    Bohs, LN
    Geiman, BJ
    Anderson, ME
    Gebhart, SC
    Trahey, GE
    [J]. ULTRASONICS, 2000, 38 (1-8) : 369 - 375