A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

被引：1

作者：

Vecchiato, Thomas ^{[1
]}

Lucchese, Claudio ^{[1
]}

Nardini, Franco Maria ^{[2
]}

Bruch, Sebastian ^{[3
]}

机构：

[1] Ca Foscari Univ Venice, Venice, Italy

[2] ISTI CNR, Pisa, Italy

[3] Pinecone, New York, NY USA

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

关键词：

Approximate Nearest Neighbor Search; Inverted File; Learning to; Rank; EFFICIENT;

D O I：

10.1145/3626772.3657931

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k . data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters-a process known as routing-then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k. configuration, the ground-truth is the set of clusters that contain the exact top-k. vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.

引用

页码：2261 / 2265

页数：5

共 50 条

[31] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
Cai, Deng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2337 - 2348
[32] A Multilabel Classification Framework for Approximate Nearest Neighbor Search
Hyvonen, Ville
Jaasaari, Elias
Roos, Teemu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[33] SOAR: Improved Indexing for Approximate Nearest Neighbor Search
Sun, Philip
Simcha, David
Dopson, Dave
Guo, Ruiqi
Kumar, Sanjiv
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[34] Associative Memories to Accelerate Approximate Nearest Neighbor Search
Gripon, Vincent
Loewe, Matthias
Vermet, Franck
APPLIED SCIENCES-BASEL, 2018, 8 (09):
[35] Fast spectral analysis for approximate nearest neighbor search
Wang, Jing
Shen, Jie
MACHINE LEARNING, 2022, 111 (06) : 2297 - 2322
[36] Private Approximate Nearest Neighbor Search with Sublinear Communication
Servan-Schreiber, Sacha
Langowski, Simon
Devadas, Srinivas
43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 911 - 929
[37] Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
Jaasaari, Elias
Hyvonen, Ville
Roos, Teemu
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT II, 2019, 11440 : 590 - 602
[38] PRODUCT TREE QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
Yuan, Jiangbo
Liu, Xiuwen
2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 2035 - 2039
[39] Scalable Distributed Hashing for Approximate Nearest Neighbor Search
Cao, Yuan
Liu, Junwei
Qi, Heng
Gui, Jie
Li, Keqiu
Ye, Jieping
Liu, Chao
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 472 - 484
[40] Approximate Nearest Neighbor Search for a Dataset of Normalized Vectors
Terasawa, Kengo
Tanaka, Yuzuru
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (09): : 1609 - 1619

← 1 2 3 4 5 →