Efficient processing of similarity search under time warping in sequence databases: an index-based approach

被引:28
|
作者
Kim, SW
Park, S
Chu, WW
机构
[1] Hanyang Univ, Coll Informat & Commun, Seoul 133791, South Korea
[2] Pohang Univ Sci & Technol, Dept Comp Sci & Engn, Pohang, South Korea
[3] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90024 USA
基金
新加坡国家研究基金会;
关键词
similarity search; sequence database; indexing; time warping distance;
D O I
10.1016/S0306-4379(03)00037-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper discusses the effective processing of similarity search that supports time warping in large sequence databases. Time warping enables sequences with similar patterns to be found even when they are of different lengths. Prior methods for processing similarity search that supports time warping failed to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan the entire database, thus suffering from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we have devised a new distance function, Dtw-lb, which consistently underestimates the time warping distance and satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For the efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes, and Dtw-lb as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we have performed extensive experiments. The results reveal that our method achieves a significant improvement in speed up to 43 times faster with a data set containing real-world S&P 500 stock data sequences, and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain increases: (1) as the number of data sequences increases, (2) the average length of data sequences increases, and (3) as the tolerance in a query decreases. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications. (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:405 / 420
页数:16
相关论文
共 50 条
  • [21] Efficient scheduling of page access in index-based join processing
    Chan, CY
    Ooi, BC
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1997, 9 (06) : 1005 - 1011
  • [22] Similarity Search in Multiple High Speed Time Series Streams under Dynamic Time Warping
    Bui Cong Giao
    Duong Tuan Anh
    [J]. PROCEEDINGS OF 2015 2ND NATIONAL FOUNDATION FOR SCIENCE AND TECHNOLOGY DEVELOPMENT CONFERENCE ON INFORMATION AND COMPUTER SCIENCE NICS 2015, 2015, : 82 - 87
  • [23] Speeding Up Similarity Search on a Large Time Series Dataset under Time Warping Distance
    Ruengronghirunya, Pongsakorn
    Niennattrakul, Vit
    Ratanamahatana, Chotirat Ann
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 981 - 988
  • [24] Structator: fast index-based search for RNA sequence-structure patterns
    Fernando Meyer
    Stefan Kurtz
    Rolf Backofen
    Sebastian Will
    Michael Beckstette
    [J]. BMC Bioinformatics, 12
  • [25] Prefix Similarity Search in Time Series Databases and a Scheme for Its Efficient Evaluation
    Feng, Yaokai
    Kaneko, Kunihiko
    [J]. WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VI, PROCEEDINGS, 2008, : 144 - 149
  • [26] Structator: fast index-based search for RNA sequence-structure patterns
    Meyer, Fernando
    Kurtz, Stefan
    Backofen, Rolf
    Will, Sebastian
    Beckstette, Michael
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [27] Speeding up similarity search under dynamic time warping by pruning unpromising alignments
    Silva, Diego F.
    Giusti, Rafael
    Keogh, Eamonn
    Batista, Gustavo E. A. P. A.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (04) : 988 - 1016
  • [28] Speeding up similarity search under dynamic time warping by pruning unpromising alignments
    Diego F. Silva
    Rafael Giusti
    Eamonn Keogh
    Gustavo E. A. P. A. Batista
    [J]. Data Mining and Knowledge Discovery, 2018, 32 : 988 - 1016
  • [29] Similarity measurement of symbolic sequence based on complexity estimate and dynamic time warping
    Cao, Renyu
    Shang, Pengjian
    [J]. NONLINEAR DYNAMICS, 2024, 112 (21) : 19055 - 19070
  • [30] Efficient algorithm for sequence similarity search based on reference indexing
    Dai, Dong-Bo
    Xiong, Yun
    Zhu, Yang-Yong
    [J]. Ruan Jian Xue Bao/Journal of Software, 2010, 21 (04): : 718 - 731