A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

被引：1

作者：

Vecchiato, Thomas ^{[1
]}

Lucchese, Claudio ^{[1
]}

Nardini, Franco Maria ^{[2
]}

Bruch, Sebastian ^{[3
]}

机构：

[1] Ca Foscari Univ Venice, Venice, Italy

[2] ISTI CNR, Pisa, Italy

[3] Pinecone, New York, NY USA

来源：

PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年

关键词：

Approximate Nearest Neighbor Search; Inverted File; Learning to; Rank; EFFICIENT;

D O I：

10.1145/3626772.3657931

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k . data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters-a process known as routing-then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k. configuration, the ground-truth is the set of clusters that contain the exact top-k. vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.

引用

页码：2261 / 2265

页数：5

共 50 条

[41] Randomized Approximate Nearest Neighbor Search with Limited Adaptivity
Liu, Mingmou
Pan, Xiaoyin
Yin, Yitong
ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 5 (01)
[42] A Multilabel Classification Framework for Approximate Nearest Neighbor Search
Hyvonen, Ville
Jaasaari, Elias
Roos, Teemu
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[43] Approximate Nearest Neighbor Search by Residual Vector Quantization
Chen, Yongjian
Guan, Tao
Wang, Cheng
SENSORS, 2010, 10 (12) : 11259 - 11273
[44] ANNA: Specialized Architecture for Approximate Nearest Neighbor Search
Lee, Yejin
Choi, Hyunji
Min, Sunhong
Lee, Hyunseung
Beak, Sangwon
Jeong, Dawoon
Lee, Jae W.
Ham, Tae Jun
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 169 - 183
[45] Learning to Index for Nearest Neighbor Search
Chiu, Chih-Yi
Prayoonwong, Amorntip
Liao, Yin-Chih
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 1942 - 1956
[46] An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing
Itotani, Yuri
Wakabayashi, Shin'ichi
Nagayama, Shinobu
Inagi, Masato
DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 203 - 213
[47] Multiattribute approximate nearest neighbor search based on navigable small world graph
Xu, Xiaoliang
Li, Chang
Wang, Yuxiang
Xia, Yixing
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (24):
[48] Feature matching algorithm based on KAZE and fast approximate nearest neighbor search
Cai, Ze-Ping
Xiao, De-Gui
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 270 - 273
[49] A Reliable Order-Statistics-Based Approximate Nearest Neighbor Search Algorithm
Verdoliva, Luisa
Cozzolino, Davide
Poggi, Giovanni
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (01) : 237 - 250
[50] Dynamic programming based optimized product quantization for approximate nearest neighbor search
Cai, Yuanzheng
Ji, Rongrong
Li, Shaozi
NEUROCOMPUTING, 2016, 217 : 110 - 118

← 1 2 3 4 5 →