Balanced compact clustering for efficient range queries in metric spaces

被引:2
|
作者
Ceselli, Alberto [1 ]
Colombo, Fabio [2 ]
Cordone, Roberto [2 ]
机构
[1] Univ Milan, Dipartimento Informat, I-26013 Crema, Italy
[2] Univ Milan, Dipartimento Informat, I-20135 Milan, Italy
关键词
Similarity search; Clustering; Information retrieval; Integer programming; Tabu search;
D O I
10.1016/j.dam.2013.12.019
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Given a set of points in a metric space, an additional query point and a positive threshold, a range query determines the subset of points whose distance from the query point does not exceed the given threshold. This paper tackles the problem of clustering the set of points so as to minimize the number of distance evaluations required by a range query. This problem models the efficient extraction of information from a database when the user is not interested in an exact match retrieval, but in the search for similar items. Since this need has become widespread in the management of text, image, audio and video databases, several data structures have been proposed to support such queries. Their optimization, however, is still left to extremely simple heuristic rules, if not to random choices. We propose the Balanced Compact Clustering Problem (BCCP) as a combinatorial model of this problem. We discuss its approximation properties and the complexity of special cases. Then, we present two Integer Programming formulations, prove their equivalence and introduce valid inequalities and variable fixing procedures. We discuss the application of a general-purpose solver on the more efficient formulation. Finally, we describe a Tabu Search algorithm and discuss its application to randomly generated and to real-world benchmark instances up to one hundred thousands points. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:43 / 67
页数:25
相关论文
共 50 条
  • [1] Efficient k-closest pair queries in general metric spaces
    Gao, Yunjun
    Chen, Lu
    Li, Xinhan
    Yao, Bin
    Chen, Gang
    [J]. VLDB JOURNAL, 2015, 24 (03): : 415 - 439
  • [2] Recursive lists of clusters: A dynamic data structure for range queries in metric spaces
    Mamede, M
    [J]. COMPUTER AND INFORMATION SCIENCES - ISCIS 2005, PROCEEDINGS, 2005, 3733 : 843 - 853
  • [3] Solving similarity joins and range queries in metric spaces with the list of twin clusters
    Paredes, Rodrigo
    Reyes, Nora
    [J]. JOURNAL OF DISCRETE ALGORITHMS, 2009, 7 (01) : 18 - 35
  • [4] Nearest Neighbor Queries in Metric Spaces
    K. L. Clarkson
    [J]. Discrete & Computational Geometry, 1999, 22 : 63 - 93
  • [5] Distributed Similarity Queries in Metric Spaces
    Keyu Yang
    Xin Ding
    Yuanliang Zhang
    Lu Chen
    Baihua Zheng
    Yunjun Gao
    [J]. Data Science and Engineering, 2019, 4 : 93 - 108
  • [6] Nearest neighbor queries in metric spaces
    Clarkson, KL
    [J]. DISCRETE & COMPUTATIONAL GEOMETRY, 1999, 22 (01) : 63 - 93
  • [7] Distributed Similarity Queries in Metric Spaces
    Yang, Keyu
    Ding, Xin
    Zhang, Yuanliang
    Chen, Lu
    Zheng, Baihua
    Gao, Yunjun
    [J]. DATA SCIENCE AND ENGINEERING, 2019, 4 (02) : 93 - 108
  • [8] Decompositions of compact metric spaces
    Wilder, RL
    [J]. AMERICAN JOURNAL OF MATHEMATICS, 1941, 63 : 691 - 697
  • [9] Compact quantum metric spaces
    Rieffel, MA
    [J]. OPERATOR ALGEBRAS, QUANTIZATION, AND NONCOMMUTATIVE GEOMETRY: A CENTENNIAL CELEBRATION HONORING JOHN VON NEUMANN AND MARSHALL H. STONE, 2004, 365 : 315 - 330
  • [10] SUPERCOMPACTNESS OF COMPACT METRIC SPACES
    OCONNOR, JL
    [J]. NOTICES OF THE AMERICAN MATHEMATICAL SOCIETY, 1970, 17 (01): : 164 - &