High-dimensional similarity retrieval using dimensional choice

被引:0
|
作者
Tahmoush, Dave [1 ]
Samet, Hanan [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
关键词
D O I
10.1109/SISAP.2008.20
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
There are several pieces of information that can be utilized in order to improve the efficiency of similarity searches on high-dimensional data. The most commonly used information is the distribution of the data itself but the use of dimensional choice based on the information in the query as well as the parameters of the distribution can provide an effective improvement in the query processing speed and storage. The use of this method can produce dimension reduction by as much as a factor of n, the number of data points in the database, over sequential search. We demonstrate that the curse of dimensionality is not based on the dimension of the data itself but primarily upon the effective dimension of the distance function. We also introduce a new distance function that utilizes fewer dimensions of the higher dimensional space to produce a maximal lower bound distance in order to approximate the full distance function. This work has demonstrated significant dimension reduction, up to 70% reduction with an improvement in accuracy or over 99% with only a 6% loss in accuracy on a prostate cancer data set.
引用
收藏
页码:35 / 42
页数:8
相关论文
共 50 条
  • [1] High-dimensional similarity retrieval using dimensional choice
    Tahmoush, Dave
    Samet, Hanan
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1 AND 2, 2008, : 490 - 497
  • [2] High-dimensional similarity joins
    Shim, K
    Srikant, R
    Agrawal, R
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (01) : 156 - 171
  • [3] High-dimensional similarity joins
    Shim, K
    Srikant, R
    Agrawal, R
    [J]. 13TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING - PROCEEDINGS, 1997, : 301 - 311
  • [4] Benchmarking access structures for the similarity retrieval of high-dimensional multimedia data
    Colossi, NG
    Nascimento, MA
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1215 - 1218
  • [5] Similarity joins for high-dimensional data using Spark
    Rong, Chuitian
    Cheng, Xiaohai
    Chen, Ziliang
    Huo, Na
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (20):
  • [6] Progressive high-dimensional similarity join
    Tok, Wee Hyong
    Bressan, Stephane
    Lee, Mong-Li
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 233 - +
  • [7] Similarity Query Processing for High-Dimensional Data
    Qin, Jianbin
    Wang, Wei
    Xiao, Chuan
    Zhang, Ying
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 3437 - 3440
  • [8] Element similarity in high-dimensional materials representations
    Onwuli, Anthony
    Hegde, Ashish V.
    Nguyen, Kevin V. T.
    Butler, Keith T.
    Walsh, Aron
    [J]. DIGITAL DISCOVERY, 2023, 2 (05): : 1558 - 1564
  • [9] Similarity Learning for High-Dimensional Sparse Data
    Liu, Kuan
    Bellet, Aurelien
    Sha, Fei
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 38, 2015, 38 : 653 - 662
  • [10] Fast similarity search for high-dimensional dataset
    Wang, Quan
    You, Suya
    [J]. ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 799 - +