Learning to Index in Large-Scale Datasets

被引:0
|
作者
Prayoonwong, Amorntip [1 ]
Wang, Cheng-Hsien [1 ]
Chiu, Chih-Yi [1 ]
机构
[1] Natl Chiayi Univ, 300 Syuefu Rd, Chiayi 60004, Taiwan
来源
关键词
Nearest neighbor search; Deep neural networks; Nearest neighbor graph; Residual vector quantization; Index structures; QUANTIZATION;
D O I
10.1007/978-3-319-73603-7_25
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a novel ranking scheme that learns the nearest neighbor relation embedded in the index structure. Given a query point, a direct way to rank clusters of the index structure is based on their Euclidean distances to the query from near to far. However, the data quantization loss will inevitably impair the index accuracy. To address this problem, the proposed method ranks clusters based on the nearest neighbor probabilities of clusters rather than their Euclidean distances. We present two algorithms for offline training and online indexing that leverage the deep neural networks to learn the neighborhood relation. The proposed method can replace the distance-based ranking scheme and can be integrated with other nearest neighbor search methods to boost their retrieval accuracy. Experiments on one million and one billion datasets demonstrate a promising result of the proposed ranking scheme.
引用
收藏
页码:305 / 316
页数:12
相关论文
共 50 条
  • [1] Learning Bayesian Network Structure from Large-scale Datasets
    Hong, Yu
    Xia, Xiaoling
    Le, Jiajin
    Zhou, Xiangdong
    [J]. 2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 258 - 264
  • [2] Datasets, tasks, and training methods for large-scale hypergraph learning
    Kim, Sunwoo
    Lee, Dongjin
    Kim, Yul
    Park, Jungho
    Hwang, Taeho
    Shin, Kijung
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 37 (06) : 2216 - 2254
  • [3] Learning From Noisy Large-Scale Datasets With Minimal Supervision
    Veit, Andreas
    Alldrin, Neil
    Chechik, Gal
    Krasin, Ivan
    Gupta, Abhinav
    Belongie, Serge
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6575 - 6583
  • [4] MMSVC: An Efficient Unsupervised Learning Approach for Large-Scale Datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    [J]. LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, 2010, 6330 : 1 - 9
  • [5] MMSVC: An efficient unsupervised learning approach for large-scale datasets
    Gu, Hong
    Zhao, Guangzhou
    Zhang, Jianliang
    [J]. NEUROCOMPUTING, 2012, 98 : 114 - 122
  • [6] Datasets, tasks, and training methods for large-scale hypergraph learning
    Sunwoo Kim
    Dongjin Lee
    Yul Kim
    Jungho Park
    Taeho Hwang
    Kijung Shin
    [J]. Data Mining and Knowledge Discovery, 2023, 37 : 2216 - 2254
  • [7] A fast diagonal distance metric learning approach for large-scale datasets
    Li, Tie
    Kou, Gang
    Peng, Yi
    Yu, Philip S.
    [J]. INFORMATION SCIENCES, 2021, 571 : 225 - 245
  • [8] Harnessing Large-Scale Herbarium Image Datasets Through Representation Learning
    Walker, Barnaby E.
    Tucker, Allan
    Nicolson, Nicky
    [J]. FRONTIERS IN PLANT SCIENCE, 2022, 12
  • [9] Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets
    Severyn, Aliaksei
    Moschitti, Alessandro
    [J]. ETERNAL SYSTEMS, 2012, 255 : 34 - 41
  • [10] Visualization of large-scale trajectory datasets
    Zachar, Gergely
    [J]. 2023 CYBER-PHYSICAL SYSTEMS AND INTERNET-OF-THINGS WEEK, CPS-IOT WEEK WORKSHOPS, 2023, : 152 - 157