Efficient top-K approximate searches against a relation with multiple attributes

被引:1
|
作者
Lu, Wei [1 ,2 ]
Chen, Jinchuan [2 ]
Du, Xiaoyong [1 ,2 ]
Wang, Jieping [3 ]
Pan, Wei [4 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China
[2] Minist Educ, Key Labs Data Engn & Knowledge Engn, Beijing, Peoples R China
[3] China Elect Standardizat Inst, Beijing, Peoples R China
[4] Northwestern Polytech Univ, Sch Engn & Comp Sci, Xian 710072, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
top-K queries; approximate search; data quality; ALGORITHMS;
D O I
10.1007/s11280-011-0137-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we study the problem of efficiently identifying K records that are most similar to a given query record, where the similarity is defined as: (1) for each record, we calculate the similarity score between the record and the query record over each individual attribute using a specific similarity function; (2) an aggregate function is utilized to combine these similarity scores with weights and the aggregated value is served as the similarity of the record. After similarities of all records have been computed, K records with the greatest similarities can further be identified. Under this framework, unfortunately, the computational cost will be extremely expensive when the cardinality of relation is large as computation of similarity for each record is required. As a result, in this paper, we propose two efficient algorithms, named ScanIndex and Top-Down (TD for short), to cope with this problem. With respect to ScanIndex, similarity scores that are equal to zero over individual attributes are free from computation. Based on ScanIndex, with respect to TD, similarity scores less than thresholds (rather than zero) over individual attributes are skipped, where these thresholds are improved dynamically over time. Experimental results demonstrate that, comparing with the naive approach, the performance can be improved by two orders of magnitude using ScanIndex and TD.
引用
收藏
页码:573 / 597
页数:25
相关论文
共 50 条
  • [1] Efficient top-K approximate searches against a relation with multiple attributes
    Wei Lu
    Jinchuan Chen
    Xiaoyong Du
    Jieping Wang
    Wei Pan
    [J]. World Wide Web, 2011, 14 : 573 - 597
  • [2] Efficient Top-k Approximate Subtree Matching in Small Memory
    Augsten, Nikolaus
    Barbosa, Denilson
    Boehlen, Michael M.
    Palpanas, Themis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1123 - 1137
  • [3] Efficient Compressed Indexing for Approximate Top-k String Retrieval
    Ferrada, Hector
    Navarro, Gonzalo
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 18 - 30
  • [4] APPROXIMATE CONSISTENT WEIGHTED SAMPLING FOR EFFICIENT TOP-K SEARCH
    Kim, Yunna
    Hwang, Heasoo
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2020, 16 (03): : 1125 - 1132
  • [5] Approximate distributed top-k queries
    Boaz Patt-Shamir
    Allon Shafrir
    [J]. Distributed Computing, 2008, 21 : 1 - 22
  • [6] Approximate distributed top-k queries
    Patt-Shamir, Boaz
    Shafrir, Allon
    [J]. DISTRIBUTED COMPUTING, 2008, 21 (01) : 1 - 22
  • [7] Efficient approximate top-k mutual information based feature selection
    Md Abdus Salam
    Senjuti Basu Roy
    Gautam Das
    [J]. Journal of Intelligent Information Systems, 2023, 61 : 191 - 223
  • [8] Energy Efficient Approximate Top-k Range Queries in Sensor Networks
    Wang, Yufeng
    Chen, Hong
    [J]. INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 99 - 101
  • [9] Efficient Approximate Top-k Query Algorithm Using Cube Index
    Chen, Dongqu
    Sun, Guang-Zhong
    Gong, Neil Zhenqiang
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 155 - 167
  • [10] Efficient approximate top-k mutual information based feature selection
    Salam, Md Abdus
    Roy, Senjuti Basu
    Das, Gautam
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 61 (01) : 191 - 223