Fast Algorithms for Top-k Approximate String Matching

被引:0
|
作者
Yang, Zhenglu [1 ]
Yu, Jianjun [2 ]
Kitsuregawa, Masaru [1 ]
机构
[1] Univ Tokyo, Inst Ind Sci, Tokyo 1138654, Japan
[2] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Top-k approximate querying on string collections is an important data analysis tool for many applications, and it has been exhaustively studied. However, the scale of the problem has increased dramatically because of the prevalence of the Web. In this paper, we aim to explore the efficient top-k similar string matching problem. Several efficient strategies are introduced, such as length aware and adaptive q-gram selection. We present a general q-gram based framework and propose two efficient algorithms based on the strategies introduced. Our techniques are experimentally evaluated on three real data sets and show a superior performance.
引用
收藏
页码:1467 / 1473
页数:7
相关论文
共 50 条
  • [31] Very fast and simple approximate string matching
    Navarro, G
    BaezaYates, R
    [J]. INFORMATION PROCESSING LETTERS, 1999, 72 (1-2) : 65 - 70
  • [32] Fast approximate string matching with finite automata
    Hulden, Mans
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (43): : 57 - 64
  • [33] Fast Convolutions and Their Applications in Approximate String Matching
    Fredriksson, Kimmo
    Grabowski, Szymon
    [J]. COMBINATORIAL ALGORITHMS, 2009, 5874 : 254 - +
  • [34] Diversified Top-k Graph Pattern Matching
    Fan, Wenfei
    Wang, Xin
    Wu, Yinghui
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (13): : 1510 - 1521
  • [35] A Fast Approximate String Matching Algorithm on GPU
    Nunes, Lucas S. N.
    Bordim, J. L.
    Nakano, K.
    Ito, Y.
    [J]. PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 188 - 192
  • [36] A FAST VLSI SOLUTION FOR APPROXIMATE STRING MATCHING
    GROSSI, R
    [J]. INTEGRATION-THE VLSI JOURNAL, 1992, 13 (02) : 195 - 206
  • [37] Fast bit-vector algorithms for approximate string matching under indel distance
    Hyyrö, H
    Pinzon, Y
    Shinohara, A
    [J]. SOFSEM 2005:THEORY AND PRACTICE OF COMPUTER SCIENCE, 2005, 3381 : 380 - 384
  • [38] Diversified Top-k Spatial Pattern Matching
    Xie, Jiahua
    Chen, Hongmei
    Wang, Lizhen
    [J]. SPATIAL DATA AND INTELLIGENCE, SPATIALDI 2022, 2022, 13614 : 87 - 98
  • [39] Top-k String Auto-Completion with Synonyms
    Xu, Pengfei
    Lu, Jiaheng
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 202 - 218
  • [40] String indexing for top-k close consecutive occurrences
    Bille, Philip
    Gortz, Inge Li
    Pedersen, Max Rishoj
    Rotenberg, Eva
    Steiner, Teresa Anna
    [J]. THEORETICAL COMPUTER SCIENCE, 2022, 927 : 133 - 147