Similarity search over incomplete symbolic sequences

被引:0
|
作者
Gu, Jie [1 ]
Jin, Xiaoming [1 ]
机构
[1] Tsinghua Univ, Software Sch, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reliable measure of similarity between symbolic sequences is an important problem in the fields of database and data mining. A lot of distance functions have been developed for symbolic sequence data in the past years. However, most of them are focused on the distance between complete symbolic sequences while the distance measurement for incomplete symbolic sequences remains unexplored. In this paper, we propose a method to process similarity search over incomplete symbolic sequences. Without any knowledge about the positions and values of the missing elements, it is impossible to get the exact distance between a query sequence and an incomplete sequence. Instead of calculating this exact distance, we map a pair of symbolic sequences to a real-valued interval, i.e, we propose a lower bound and an upper bound of the underlying exact distance between a query sequence and an incomplete sequence. In this case, similarity search can be conducted with guaranteed performance in terms of either recall or precision. The proposed method is also extended to handle with real-valued sequence data. The experimental results on both synthetic and real-world data show that our method is both efficient and effective.
引用
收藏
页码:339 / +
页数:2
相关论文
共 50 条
  • [31] Efficient Graph Similarity Search Over Large Graph Databases
    Zheng, Weiguo
    Zou, Lei
    Lian, Xiang
    Wang, Dong
    Zhao, Dongyan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (04) : 964 - 978
  • [32] Efficient similarity search over future stream time series
    Lian, Xiang
    Chen, Lei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (01) : 40 - 54
  • [33] Semantic SPARQL Similarity Search Over RDF Knowledge Graphs
    Zheng, Weiguo
    Zou, Lei
    Peng, Wei
    Yan, Xifeng
    Song, Shaoxu
    Zhao, Dongyan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (11): : 840 - 851
  • [34] Towards Representation Independent Similarity Search Over Graph Databases
    Chodpathumwan, Yodsawalai
    Aleyasen, Amirhossein
    Termehchy, Arash
    Sun, Yizhou
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 2233 - 2238
  • [35] Dynamic Similarity Search over Encrypted Data with Low Leakage
    Homann, Daniel
    Goege, Christian
    Wiese, Lena
    SECURITY AND TRUST MANAGEMENT (STM 2017), 2017, 10547 : 19 - 35
  • [36] Accounting for Language Changes Over Time in Document Similarity Search
    Morsy, Sara
    Karypis, George
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2016, 35 (01)
  • [37] Similarity Search in Metric Space over Content Addressable Network
    Dong, Dafan
    Wu, Ying
    Wang, Xuefei
    Luo, Tao
    Huang, Guowei
    Wu, Gongyi
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 702 - 707
  • [38] Similarity of Private Keyword Search over Encrypted Document Collection
    Elmehdwi, Yousef
    Jiang, Wei
    Hurson, Ali
    ADVANCES IN COMPUTERS, VOL 94, 2014, 94 : 71 - 102
  • [39] Approximate similarity search over multiple stream time series
    Lian, Xiang
    Chen, Lei
    Wang, Bin
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 962 - +
  • [40] Symbolic computation with sequences
    M. Petkovšek
    Programming and Computer Software, 2006, 32 : 65 - 70