Efficient top-K approximate searches against a relation with multiple attributes

被引：1

作者：

Lu, Wei ^{[1
,2
]}

Chen, Jinchuan ^{[2
]}

Du, Xiaoyong ^{[1
,2
]}

Wang, Jieping ^{[3
]}

Pan, Wei ^{[4
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing 100872, Peoples R China

[2] Minist Educ, Key Labs Data Engn & Knowledge Engn, Beijing, Peoples R China

[3] China Elect Standardizat Inst, Beijing, Peoples R China

[4] Northwestern Polytech Univ, Sch Engn & Comp Sci, Xian 710072, Peoples R China

来源：

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2011年 / 14卷 / 5-6期

基金：

中国国家自然科学基金;

关键词：

top-K queries; approximate search; data quality; ALGORITHMS;

D O I：

10.1007/s11280-011-0137-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we study the problem of efficiently identifying K records that are most similar to a given query record, where the similarity is defined as: (1) for each record, we calculate the similarity score between the record and the query record over each individual attribute using a specific similarity function; (2) an aggregate function is utilized to combine these similarity scores with weights and the aggregated value is served as the similarity of the record. After similarities of all records have been computed, K records with the greatest similarities can further be identified. Under this framework, unfortunately, the computational cost will be extremely expensive when the cardinality of relation is large as computation of similarity for each record is required. As a result, in this paper, we propose two efficient algorithms, named ScanIndex and Top-Down (TD for short), to cope with this problem. With respect to ScanIndex, similarity scores that are equal to zero over individual attributes are free from computation. Based on ScanIndex, with respect to TD, similarity scores less than thresholds (rather than zero) over individual attributes are skipped, where these thresholds are improved dynamically over time. Experimental results demonstrate that, comparing with the naive approach, the performance can be improved by two orders of magnitude using ScanIndex and TD.

引用

页码：573 / 597

页数：25

共 50 条

[1] Efficient top-K approximate searches against a relation with multiple attributes
Wei Lu
Jinchuan Chen
Xiaoyong Du
Jieping Wang
Wei Pan
[J]. World Wide Web, 2011, 14 : 573 - 597
[2] Efficient Top-k Approximate Subtree Matching in Small Memory
Augsten, Nikolaus
Barbosa, Denilson
Boehlen, Michael M.
Palpanas, Themis
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1123 - 1137
[3] Efficient Compressed Indexing for Approximate Top-k String Retrieval
Ferrada, Hector
Navarro, Gonzalo
[J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 18 - 30
[4] APPROXIMATE CONSISTENT WEIGHTED SAMPLING FOR EFFICIENT TOP-K SEARCH
Kim, Yunna
Hwang, Heasoo
[J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2020, 16 (03): : 1125 - 1132
[5] Approximate distributed top-k queries
Boaz Patt-Shamir
Allon Shafrir
[J]. Distributed Computing, 2008, 21 : 1 - 22
[6] Approximate distributed top-k queries
Patt-Shamir, Boaz
Shafrir, Allon
[J]. DISTRIBUTED COMPUTING, 2008, 21 (01) : 1 - 22
[7] Efficient approximate top-k mutual information based feature selection
Md Abdus Salam
Senjuti Basu Roy
Gautam Das
[J]. Journal of Intelligent Information Systems, 2023, 61 : 191 - 223
[8] Energy Efficient Approximate Top-k Range Queries in Sensor Networks
Wang, Yufeng
Chen, Hong
[J]. INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 1, PROCEEDINGS, 2009, : 99 - 101
[9] Efficient Approximate Top-k Query Algorithm Using Cube Index
Chen, Dongqu
Sun, Guang-Zhong
Gong, Neil Zhenqiang
[J]. WEB TECHNOLOGIES AND APPLICATIONS, 2011, 6612 : 155 - 167
[10] Efficient approximate top-k mutual information based feature selection
Salam, Md Abdus
Roy, Senjuti Basu
Das, Gautam
[J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 61 (01) : 191 - 223

← 1 2 3 4 5 →