APPROXIMATE CONSISTENT WEIGHTED SAMPLING FOR EFFICIENT TOP-K SEARCH

被引:0
|
作者
Kim, Yunna [1 ]
Hwang, Heasoo [1 ]
机构
[1] Univ Seoul, Dept Comp Sci & Engn, 163 Seoulsiripdaero, Seoul 02504, South Korea
关键词
Generalized Jaccard similarity; Minwise hashing; Weighted sampling;
D O I
10.24507/ijicic.16.03.1125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Top-k search on weighted sets can be very slow since the computation cost of generalized Jaccard similarity is proportional to the dimensionality of sets. ICWS generates samples of high quality, but its hashing cost is too high to generate samples from high-dimensional weighted sets. We propose simple hashing methods, ICWS P and its variants, that approximate ICWS very efficiently in O(D center dot K/B). Extensive experiments show that hashing cost is reduced significantly while top-k precision and classification accuracy with estimated set similarity are almost as high as those of ICWS. Query time can also be improved since less than K samples are compared for sets of low similarity.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [1] Efficient Top-k Closeness Centrality Search
    Olsen, Paul W., Jr.
    Labouseur, Alan G.
    Hwang, Jeong-Hyon
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 196 - 207
  • [2] Efficient Retrieval of Top-k Weighted Spatial Triangles
    Taniguchi, Ryosuke
    Amagata, Daichi
    Hara, Takahiro
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 224 - 231
  • [3] Semantic enhanced Top-k similarity search on weighted HIN
    Yun Zhang
    Minghe Yu
    Tiancheng Zhang
    Ge Yu
    [J]. Neural Computing and Applications, 2022, 34 : 16911 - 16927
  • [4] Semantic enhanced Top-k similarity search on weighted HIN
    Zhang, Yun
    Yu, Minghe
    Zhang, Tiancheng
    Yu, Ge
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (19): : 16911 - 16927
  • [5] Efficient Top-k Keyword Search on XML Streams
    Li, Lingli
    Wang, Hongzhi
    Li, Jianzhong
    Luo, Jizhou
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1041 - 1046
  • [6] Efficient Top-k Edge Structural Diversity Search
    Zhang, Qi
    Li, Rong-Hua
    Yang, Qixuan
    Wang, Guoren
    Qin, Lu
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 205 - 216
  • [7] Efficient Top-k Ego-Betweenness Search
    Zhang, Qi
    Li, Rong-Hua
    Pan, Minjia
    Dai, Yongheng
    Wang, Guoren
    Yuan, Ye
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 380 - 392
  • [8] Efficient Top-k Approximate Subtree Matching in Small Memory
    Augsten, Nikolaus
    Barbosa, Denilson
    Boehlen, Michael M.
    Palpanas, Themis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1123 - 1137
  • [9] Efficient Compressed Indexing for Approximate Top-k String Retrieval
    Ferrada, Hector
    Navarro, Gonzalo
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 18 - 30
  • [10] Approximate top-k structural similarity search over XML documents
    Xie, T
    Sha, CF
    Wang, XL
    Zhou, AY
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 319 - 330