APPROXIMATE CONSISTENT WEIGHTED SAMPLING FOR EFFICIENT TOP-K SEARCH

被引:0
|
作者
Kim, Yunna [1 ]
Hwang, Heasoo [1 ]
机构
[1] Univ Seoul, Dept Comp Sci & Engn, 163 Seoulsiripdaero, Seoul 02504, South Korea
关键词
Generalized Jaccard similarity; Minwise hashing; Weighted sampling;
D O I
10.24507/ijicic.16.03.1125
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Top-k search on weighted sets can be very slow since the computation cost of generalized Jaccard similarity is proportional to the dimensionality of sets. ICWS generates samples of high quality, but its hashing cost is too high to generate samples from high-dimensional weighted sets. We propose simple hashing methods, ICWS P and its variants, that approximate ICWS very efficiently in O(D center dot K/B). Extensive experiments show that hashing cost is reduced significantly while top-k precision and classification accuracy with estimated set similarity are almost as high as those of ICWS. Query time can also be improved since less than K samples are compared for sets of low similarity.
引用
收藏
页码:1125 / 1132
页数:8
相关论文
共 50 条
  • [21] Energy Efficient Approximate Top-k Range Queries in Sensor Networks
    Wang, Yufeng
    Chen, Hong
    [J]. INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 99 - 101
  • [22] Efficient Top-k Keyword Search Over Multidimensional Databases
    Yu, Ziqiang
    Yu, Xiaohui
    Liu, Yang
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2013, 9 (03) : 1 - 21
  • [23] Efficient top-K approximate searches against a relation with multiple attributes
    Wei Lu
    Jinchuan Chen
    Xiaoyong Du
    Jieping Wang
    Wei Pan
    [J]. World Wide Web, 2011, 14 : 573 - 597
  • [24] Efficient Approximate Top-k Query Algorithm Using Cube Index
    Chen, Dongqu
    Sun, Guang-Zhong
    Gong, Neil Zhenqiang
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 155 - 167
  • [25] Efficient Top-k Keyword Search in Graphs with Polynomial Delay
    Kargar, Mehdi
    An, Aijun
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 1269 - 1272
  • [26] Efficient approximate top-k mutual information based feature selection
    Salam, Md Abdus
    Roy, Senjuti Basu
    Das, Gautam
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 61 (01) : 191 - 223
  • [27] On Sampling Top-K Recommendation Evaluation
    Li, Dong
    Jin, Ruoming
    Gao, Jing
    Liu, Zhi
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2114 - 2124
  • [28] Consistent Top-k Queries over Time
    Lee, Mong Li
    Hsu, Wynne
    Li, Ling
    Tok, Wee Hyong
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 51 - +
  • [29] Efficient top-k algorithm for eXtensible Markup Language keyword search
    Yu, H.
    Deng, Z. -H.
    Gao, N.
    [J]. IET SOFTWARE, 2012, 6 (04) : 342 - 349
  • [30] Approximate top-k queries in sensor networks
    Patt-Shamir, Boaz
    Shafrir, Allon
    [J]. STRUCTURAL INFORMATION AND COMMUNICATION COMPLEXITY, PROCEEDINGS, 2006, 4056 : 319 - +