An index-based approach for similarity search supporting time warping in large sequence databases

被引:161
|
作者
Kim, SW [1 ]
Park, S [1 ]
Chu, WW [1 ]
机构
[1] Kangwon Natl Univ, Dept Comp Informat & Commun Engn, Chunchon, South Korea
关键词
D O I
10.1109/ICDE.2001.914875
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function Dtw-lb that consistently underestimates the time warping distance and also satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and Dtw-lb as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and tip to 720 times with very large synthetic data.
引用
收藏
页码:607 / 614
页数:8
相关论文
共 50 条
  • [1] Efficient processing of similarity search under time warping in sequence databases: an index-based approach
    Kim, SW
    Park, S
    Chu, WW
    [J]. INFORMATION SYSTEMS, 2004, 29 (05) : 405 - 420
  • [2] Index-Based Approach to Similarity Search in Protein and Nucleotide Databases
    Hoksza, David
    Skopal, Tomas
    [J]. DATESO 2007 - DATABASES, TEXTS, SPECIFICATIONS, OBJECTS: PROCEEDINGS OF THE 7TH ANNUAL INTERNATIONAL WORKSHOP, 2007, 235 : 67 - 80
  • [3] Towards index-based similarity search for protein structure databases
    Çamoglu, O
    Kahveci, T
    Singh, AK
    [J]. PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 148 - 158
  • [4] AN INDEX-BASED APPROACH TO QUERY MAMMOGRAPHIC DATABASES
    Valente, Frederico
    Bastiao, Luis
    Silva, Augusto
    [J]. ICEM15: 15TH INTERNATIONAL CONFERENCE ON EXPERIMENTAL MECHANICS, 2012,
  • [5] An adaptive index structure for similarity search in large image databases
    Wu, P
    Manjunath, BS
    [J]. INTERNET MULTIMEDIA MANAGEMENT SYSTEMS II, 2001, 4519 : 32 - 41
  • [6] Parallelization of similarity search in large time series databases
    Qiao, Jonathan
    Ye, Yang
    Zhang, Chaoyang
    [J]. FIRST INTERNATIONAL MULTI-SYMPOSIUMS ON COMPUTER AND COMPUTATIONAL SCIENCES (IMSCCS 2006), PROCEEDINGS, VOL 1, 2006, : 355 - +
  • [7] An index-based time-series subsequence matching under time warping
    Chang, Byoungchol
    Cha, Jaehyuk
    Kim, Sang-Wook
    Shin, Miyoung
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2006, 4251 : 1043 - 1050
  • [8] Efficient Community Search over Large Directed Graphs: An Augmented Index-based Approach
    Chen, Yankai
    Zhan, Jie
    Fang, Yixiang
    Cao, Xin
    King, Irwin
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3544 - 3550
  • [9] An efficient similarity search based on indexing in large DNA databases
    Jeong, In-Seon
    Park, Kyoung-Wook
    Kang, Seung-Ho
    Lim, Hyeong-Seok
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (02) : 131 - 136
  • [10] Regression time warping for similarity measure of sequence
    Lei, HS
    Govindaraju, V
    [J]. FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2004, : 826 - 830