Flexible Aggregate Similarity Search in High-Dimensional Data Sets

被引：0

作者：

Houle, Michael E. ^{[1
]}

Ma, Xiguo ^{[2
]}

Oria, Vincent ^{[3
]}

机构：

[1] Natl Inst Informat, Tokyo 1018430, Japan

[2] Google, Mountain View, CA 94043 USA

[3] New Jersey Inst Technol, Newark, NJ 07102 USA

来源：

SIMILARITY SEARCH AND APPLICATIONS, SISAP 2015 | 2015年 / 9371卷

基金：

美国国家科学基金会;

关键词：

NEAREST-NEIGHBOR QUERIES;

D O I：

10.1007/978-3-319-25087-8_2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Numerous applications in different fields, such as spatial databases, multimedia databases, data mining and recommender systems, may benefit from efficient and effective aggregate similarity search, also known as aggregate nearest neighbor (AggNN) search. Given a group of query objects Q, the goal of AggNN is to retrieve the k most similar objects from the database, where the underlying similarity measure is defined as an aggregation (usually sum, avg or max) of the distances between the retrieved objects and every query object in Q. Recently, the problem was generalized so as to retrieve the k objects which are most similar to a fixed proportion of the elements of Q. This variant of aggregate similarity search is referred to as 'flexible AggNN', or FANN. In this work, we propose two approximation algorithms, one for the sum and avg variants of FANN, and the other for the max variant. Extensive experiments are provided showing that, relative to state-of-the-art approaches (both exact and approximate), our algorithms produce query results with good accuracy, while at the same time being very efficient - even for real datasets of very high dimension.

引用

页码：15 / 28

页数：14

共 50 条

[1] Fast approximate similarity search in extremely high-dimensional data sets
Houle, ME
Sakuma, J
[J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 619 - 630
[2] High-Dimensional Similarity Search for Scalable Data Science
Echihabi, Karima
Zoumpatianos, Kostas
Palpanas, Themis
[J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2369 - 2372
[3] Quantization techniques for similarity search in high-dimensional data spaces
Garcia-Arellano, C
Sevcik, K
[J]. NEW HORIZONS IN INFORMATION MANAGEMENT, 2003, 2712 : 75 - 94
[4] Effective and Efficient Algorithms for Flexible Aggregate Similarity Search in High Dimensional Spaces
Houle, Michael E.
Ma, Xiguo
Oria, Vincent
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (12) : 3258 - 3273
[5] Indexing high-dimensional data for efficient in-memory similarity search
Cui, B
Ooi, BC
Su, JW
Tan, KL
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (03) : 339 - 353
[6] Indexing high-dimensional data for main-memory similarity search
Yu, Xiaohui
Doug, Junfeng
[J]. INFORMATION SYSTEMS, 2010, 35 (07) : 825 - 843
[7] Fast similarity search for high-dimensional dataset
Wang, Quan
You, Suya
[J]. ISM 2006: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2006, : 799 - +
[8] High-dimensional similarity search using data-sensitive space partitioning
Kulkarni, Sachin
Orlandic, Ratko
[J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 738 - 750
[9] Memory Vectors for Similarity Search in High-Dimensional Spaces
Iscen, Ahmet
Furon, Teddy
Gripon, Vincent
Rabbat, Michael
Jegou, Herve
[J]. IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (01) : 65 - 77
[10] Clustering for approximate similarity search in high-dimensional spaces
Li, C
Chang, E
Garcia-Molina, H
Wiederhold, G
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2002, 14 (04) : 792 - 808

← 1 2 3 4 5 →