Leveraging deletion neighborhoods and trie for efficient string similarity search and join

被引:1
|
作者
Cui, Jia [1 ,3 ]
Meng, Dan [2 ]
Chen, Zhong-Tao [1 ,3 ]
机构
[1] Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
[2] Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China
[3] University of Chinese Academy of Sciences, Beijing, China
关键词
Deletion neighborhoods - Edits distance - Similarity join - Similarity search - Trie;
D O I
10.1007/978-3-319-12844-3_1
中图分类号
学科分类号
摘要
String similarity search and joins are primitive operations in database and information retrieval to address the poor data quality problem. Due to the high complexity of deletion neighborhoods, existing methods resort to hashing schemes to achieve reduction in space requirement of the index. However the introduced hash collisions need to be verified by the costly edit distance computation. In this paper, we focus on achieving a faster query speed with affordable memory consumptions. We propose a novel method that leverages the power of deletion neighborhoods and trie to answer the edit distance based string similarity query efficiently. We utilize the trie to share common prefixes of deletion neighborhoods and propose subtree merging optimization to reduce the index size. Then the index partition strategies are discussed and bit vector based verification method is proposed to speed up the query. The experimental results show that our method outperforms state-of-art methods on real dataset. © Springer International Publishing Switzerland 2014.
引用
收藏
相关论文
共 50 条
  • [2] Leveraging Deletion Neighborhoods and Trie for Efficient String Similarity Search and Join
    Cui, Jia
    Meng, Dan
    Chen, Zhong-Tao
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 1 - 13
  • [3] Trie-join: a trie-based method for efficient string similarity joins
    Jianhua Feng
    Jiannan Wang
    Guoliang Li
    [J]. The VLDB Journal, 2012, 21 : 437 - 461
  • [4] Trie-join: a trie-based method for efficient string similarity joins
    Feng, Jianhua
    Wang, Jiannan
    Li, Guoliang
    [J]. VLDB JOURNAL, 2012, 21 (04): : 437 - 461
  • [5] Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints
    Wang, Jiannan
    Feng, Jianhua
    Li, Guoliang
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01): : 1219 - 1230
  • [6] String similarity search and join: a survey
    Minghe Yu
    Guoliang Li
    Dong Deng
    Jianhua Feng
    [J]. Frontiers of Computer Science, 2016, 10 : 399 - 417
  • [7] String similarity search and join: a survey
    Yu, Minghe
    Li, Guoliang
    Deng, Dong
    Feng, Jianhua
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (03) : 399 - 417
  • [8] String similarity search and join:a survey
    Minghe YU
    Guoliang LI
    Dong DENG
    Jianhua FENG
    [J]. Frontiers of Computer Science, 2016, 10 (03) : 399 - 417
  • [9] Highly Efficient String Similarity Search and Join over Compressed Indexes
    Xiao, Guorui
    Wang, Jin
    Lin, Chunbin
    Zaniolo, Carlo
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 232 - 244
  • [10] EFFICIENT STRING EDIT SIMILARITY JOIN ALGORITHM
    Gouda, Karam
    Rashad, Metwally
    [J]. COMPUTING AND INFORMATICS, 2017, 36 (03) : 683 - 704