Efficient processing of similarity search under time warping in sequence databases: an index-based approach

被引：28

作者：

Kim, SW

Park, S

Chu, WW

机构：

[1] Hanyang Univ, Coll Informat & Commun, Seoul 133791, South Korea

[2] Pohang Univ Sci & Technol, Dept Comp Sci & Engn, Pohang, South Korea

[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA

来源：

INFORMATION SYSTEMS | 2004年 / 29卷 / 05期

基金：

新加坡国家研究基金会;

关键词：

similarity search; sequence database; indexing; time warping distance;

D O I：

10.1016/S0306-4379(03)00037-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper discusses the effective processing of similarity search that supports time warping in large sequence databases. Time warping enables sequences with similar patterns to be found even when they are of different lengths. Prior methods for processing similarity search that supports time warping failed to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan the entire database, thus suffering from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we have devised a new distance function, Dtw-lb, which consistently underestimates the time warping distance and satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For the efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes, and Dtw-lb as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we have performed extensive experiments. The results reveal that our method achieves a significant improvement in speed up to 43 times faster with a data set containing real-world S&P 500 stock data sequences, and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain increases: (1) as the number of data sequences increases, (2) the average length of data sequences increases, and (3) as the tolerance in a query decreases. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications. (C) 2003 Elsevier Ltd. All rights reserved.

引用

页码：405 / 420

页数：16

共 50 条

[31] Dynamic Time Warping Under Product Quantization, With Applications to Time-Series Data Similarity Search
Zhang, Haowen
Dong, Yabo
Li, Jing
Xu, Duanqing
[J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 9 (14) : 11814 - 11826
[32] Prediction of Customers' Needs: An Approach Based on Similarity Search in Transactions Databases
Hanyf, Youssef
Silkan, Hassan
[J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY FOR ORGANIZATIONS DEVELOPMENT (IT4OD), 2016,
[33] An efficient length-segmented inverted index-based set similarity query algorithm
Li, Mengjuan
Jia, Lianyin
Hu, Juntao
Zhang, Ruiqi
Wei, Shoulin
Pan, Mengni
[J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2022, 16 (01) : 85 - 95
[34] Efficient index-based KNN join processing for high-dimensional data
Yu, Cui
Cui, Bin
Wang, Shuguang
Su, Jianwen
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2007, 49 (04) : 332 - 344
[35] An efficient approach to similarity-based retrieval on top of relational databases
Schumacher, J
Bergmann, R
[J]. ADVANCES IN CASE-BASED REASONING, PROCEEDINGS, 2001, 1898 : 273 - 284
[36] Dynamic and Efficient Private Keyword Search over Inverted Index-Based Encrypted Data
Zhang, Rui
Xue, Rui
Yu, Ting
Liu, Ling
[J]. ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2016, 16 (03)
[37] AN APPROACH FOR TIME SERIES SIMILARITY SEARCH BASED ON LUCENE
Chang, Min
Lou, Yuansheng
Qiu, Lei
[J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 210 - 214
[38] Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
Fernando Meyer
Stefan Kurtz
Michael Beckstette
[J]. BMC Bioinformatics, 14
[39] Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns
Meyer, Fernando
Kurtz, Stefan
Beckstette, Michael
[J]. BMC BIOINFORMATICS, 2013, 14
[40] Efficient geometry-based similarity search of 3D spatial databases
Keim, DA
[J]. SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 419 - 430

← 1 2 3 4 5 →