A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search

被引:1
|
作者
Vecchiato, Thomas [1 ]
Lucchese, Claudio [1 ]
Nardini, Franco Maria [2 ]
Bruch, Sebastian [3 ]
机构
[1] Ca Foscari Univ Venice, Venice, Italy
[2] ISTI CNR, Pisa, Italy
[3] Pinecone, New York, NY USA
来源
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024 | 2024年
关键词
Approximate Nearest Neighbor Search; Inverted File; Learning to; Rank; EFFICIENT;
D O I
10.1145/3626772.3657931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A critical piece of the modern information retrieval puzzle is approximate nearest neighbor search. Its objective is to return a set of k . data points that are closest to a query point, with its accuracy measured by the proportion of exact nearest neighbors captured in the returned set. One popular approach to this question is clustering: The indexing algorithm partitions data points into non-overlapping subsets and represents each partition by a point such as its centroid. The query processing algorithm first identifies the nearest clusters-a process known as routing-then performs a nearest neighbor search over those clusters only. In this work, we make a simple observation: The routing function solves a ranking problem. Its quality can therefore be assessed with a ranking metric, making the function amenable to learning-to-rank. Interestingly, ground-truth is often freely available: Given a query distribution in a top-k. configuration, the ground-truth is the set of clusters that contain the exact top-k. vectors. We develop this insight and apply it to Maximum Inner Product Search (MIPS). As we demonstrate empirically on various datasets, learning a simple linear function consistently improves the accuracy of clustering-based MIPS.
引用
收藏
页码:2261 / 2265
页数:5
相关论文
共 50 条
  • [31] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search
    Cai, Deng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2337 - 2348
  • [32] A Multilabel Classification Framework for Approximate Nearest Neighbor Search
    Hyvonen, Ville
    Jaasaari, Elias
    Roos, Teemu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [33] SOAR: Improved Indexing for Approximate Nearest Neighbor Search
    Sun, Philip
    Simcha, David
    Dopson, Dave
    Guo, Ruiqi
    Kumar, Sanjiv
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [34] Associative Memories to Accelerate Approximate Nearest Neighbor Search
    Gripon, Vincent
    Loewe, Matthias
    Vermet, Franck
    APPLIED SCIENCES-BASEL, 2018, 8 (09):
  • [35] Fast spectral analysis for approximate nearest neighbor search
    Wang, Jing
    Shen, Jie
    MACHINE LEARNING, 2022, 111 (06) : 2297 - 2322
  • [36] Private Approximate Nearest Neighbor Search with Sublinear Communication
    Servan-Schreiber, Sacha
    Langowski, Simon
    Devadas, Srinivas
    43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 911 - 929
  • [37] Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search
    Jaasaari, Elias
    Hyvonen, Ville
    Roos, Teemu
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT II, 2019, 11440 : 590 - 602
  • [38] PRODUCT TREE QUANTIZATION FOR APPROXIMATE NEAREST NEIGHBOR SEARCH
    Yuan, Jiangbo
    Liu, Xiuwen
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 2035 - 2039
  • [39] Scalable Distributed Hashing for Approximate Nearest Neighbor Search
    Cao, Yuan
    Liu, Junwei
    Qi, Heng
    Gui, Jie
    Li, Keqiu
    Ye, Jieping
    Liu, Chao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 472 - 484
  • [40] Approximate Nearest Neighbor Search for a Dataset of Normalized Vectors
    Terasawa, Kengo
    Tanaka, Yuzuru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (09): : 1609 - 1619