Flexible Aggregate Similarity Search in High-Dimensional Data Sets

被引:0
|
作者
Houle, Michael E. [1 ]
Ma, Xiguo [2 ]
Oria, Vincent [3 ]
机构
[1] Natl Inst Informat, Tokyo 1018430, Japan
[2] Google, Mountain View, CA 94043 USA
[3] New Jersey Inst Technol, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
NEAREST-NEIGHBOR QUERIES;
D O I
10.1007/978-3-319-25087-8_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Numerous applications in different fields, such as spatial databases, multimedia databases, data mining and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum, avg or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as 'flexible AggNN', or FANN. In this work, we propose two approximation algorithms, one for the sum and avg variants of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient - even for real datasets of very high dimension.
引用
收藏
页码:15 / 28
页数:14
相关论文
共 50 条
  • [1] Fast approximate similarity search in extremely high-dimensional data sets
    Houle, ME
    Sakuma, J
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 619 - 630
  • [2] High-Dimensional Similarity Search for Scalable Data Science
    Echihabi, Karima
    Zoumpatianos, Kostas
    Palpanas, Themis
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2369 - 2372
  • [3] Quantization techniques for similarity search in high-dimensional data spaces
    Garcia-Arellano, C
    Sevcik, K
    [J]. NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 75 - 94
  • [4] Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces
    Houle, Michael E.
    Ma, Xiguo
    Oria, Vincent
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (12) : 3258 - 3273
  • [5] Indexing high-dimensional data for efficient in-memory similarity search
    Cui, B
    Ooi, BC
    Su, JW
    Tan, KL
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (03) : 339 - 353
  • [6] Indexing high-dimensional data for main-memory similarity search
    Yu, Xiaohui
    Doug, Junfeng
    [J]. INFORMATION SYSTEMS, 2010, 35 (07) : 825 - 843
  • [7] Fast similarity search for high-dimensional dataset
    Wang, Quan
    You, Suya
    [J]. ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 799 - +
  • [8] High-dimensional similarity search using data-sensitive space partitioning
    Kulkarni, Sachin
    Orlandic, Ratko
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 738 - 750
  • [9] Memory Vectors for Similarity Search in High-Dimensional Spaces
    Iscen, Ahmet
    Furon, Teddy
    Gripon, Vincent
    Rabbat, Michael
    Jegou, Herve
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (01) : 65 - 77
  • [10] Clustering for approximate similarity search in high-dimensional spaces
    Li, C
    Chang, E
    Garcia-Molina, H
    Wiederhold, G
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (04) : 792 - 808