A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

被引:1
|
作者
Vecchiato, Thomas [1 ]
Lucchese, Claudio [1 ]
Nardini, Franco Maria [2 ]
Bruch, Sebastian [3 ]
机构
[1] Ca Foscari Univ Venice, Venice, Italy
[2] ISTI CNR, Pisa, Italy
[3] Pinecone, New York, NY USA
来源
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年
关键词
Approximate Nearest Neighbor Search; Inverted File; Learning to; Rank; EFFICIENT;
D O I
10.1145/3626772.3657931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k . data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters-a process known as routing-then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k. configuration, the ground-truth is the set of clusters that contain the exact top-k. vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.
引用
收藏
页码:2261 / 2265
页数:5
相关论文
共 50 条
  • [41] Randomized Approximate Nearest Neighbor Search with Limited Adaptivity
    Liu, Mingmou
    Pan, Xiaoyin
    Yin, Yitong
    ACM TRANSACTIONS ON PARALLEL COMPUTING, 2018, 5 (01)
  • [42] A Multilabel Classification Framework for Approximate Nearest Neighbor Search
    Hyvonen, Ville
    Jaasaari, Elias
    Roos, Teemu
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [43] Approximate Nearest Neighbor Search by Residual Vector Quantization
    Chen, Yongjian
    Guan, Tao
    Wang, Cheng
    SENSORS, 2010, 10 (12) : 11259 - 11273
  • [44] ANNA: Specialized Architecture for Approximate Nearest Neighbor Search
    Lee, Yejin
    Choi, Hyunji
    Min, Sunhong
    Lee, Hyunseung
    Beak, Sangwon
    Jeong, Dawoon
    Lee, Jae W.
    Ham, Tae Jun
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 169 - 183
  • [45] Learning to Index for Nearest Neighbor Search
    Chiu, Chih-Yi
    Prayoonwong, Amorntip
    Liao, Yin-Chih
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (08) : 1942 - 1956
  • [46] An Approximate Nearest Neighbor Search Algorithm Using Distance-Based Hashing
    Itotani, Yuri
    Wakabayashi, Shin'ichi
    Nagayama, Shinobu
    Inagi, Masato
    DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 203 - 213
  • [47] Multiattribute approximate nearest neighbor search based on navigable small world graph
    Xu, Xiaoliang
    Li, Chang
    Wang, Yuxiang
    Xia, Yixing
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (24):
  • [48] Feature matching algorithm based on KAZE and fast approximate nearest neighbor search
    Cai, Ze-Ping
    Xiao, De-Gui
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SERVICE SYSTEM (CSSS), 2014, 109 : 270 - 273
  • [49] A Reliable Order-Statistics-Based Approximate Nearest Neighbor Search Algorithm
    Verdoliva, Luisa
    Cozzolino, Davide
    Poggi, Giovanni
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (01) : 237 - 250
  • [50] Dynamic programming based optimized product quantization for approximate nearest neighbor search
    Cai, Yuanzheng
    Ji, Rongrong
    Li, Shaozi
    NEUROCOMPUTING, 2016, 217 : 110 - 118