Fast Algorithms for Top-k Approximate String Matching

被引:0
|
作者
Yang, Zhenglu [1 ]
Yu, Jianjun [2 ]
Kitsuregawa, Masaru [1 ]
机构
[1] Univ Tokyo, Inst Ind Sci, Tokyo 1138654, Japan
[2] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Top-k approximate querying on string collections is an important data analysis tool for many applications, and it has been exhaustively studied. However, the scale of the problem has increased dramatically because of the prevalence of the Web. In this paper, we aim to explore the efficient top-k similar string matching problem. Several efficient strategies are introduced, such as length aware and adaptive q-gram selection. We present a general q-gram based framework and propose two efficient algorithms based on the strategies introduced. Our techniques are experimentally evaluated on three real data sets and show a superior performance.
引用
收藏
页码:1467 / 1473
页数:7
相关论文
共 50 条
  • [1] Fast, Expressive Top-k Matching
    Culhane, William
    Jayaram, K. R.
    Eugster, Patrick
    [J]. ACM/IFIP/USENIX MIDDLEWARE 2014, 2014, : 73 - 84
  • [2] TASM: Top-k Approximate Subtree Matching
    Augsten, Nikolaus
    Barbosa, Denilson
    Boehlen, Michael
    Palpanas, Themis
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 353 - 364
  • [3] Fast algorithms for approximate circular string matching
    Carl Barton
    Costas S Iliopoulos
    Solon P Pissis
    [J]. Algorithms for Molecular Biology, 9
  • [4] Fast algorithms for approximate circular string matching
    Barton, Carl
    Iliopoulos, Costas S.
    Pissis, Solon P.
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2014, 9
  • [5] Efficient Compressed Indexing for Approximate Top-k String Retrieval
    Ferrada, Hector
    Navarro, Gonzalo
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 18 - 30
  • [6] Efficient Top-k Approximate Subtree Matching in Small Memory
    Augsten, Nikolaus
    Barbosa, Denilson
    Boehlen, Michael M.
    Palpanas, Themis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1123 - 1137
  • [7] ALGORITHMS FOR APPROXIMATE STRING MATCHING
    UKKONEN, E
    [J]. INFORMATION AND CONTROL, 1985, 64 (1-3): : 100 - 118
  • [8] FAST APPROXIMATE STRING MATCHING
    OWOLABI, O
    MCGREGOR, DR
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1988, 18 (04): : 387 - 393
  • [9] Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA
    Liu, Yu
    Guo, Longjiang
    Li, Jinbao
    Ren, Meirui
    Li, Keqin
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2414 - 2421
  • [10] Top-k String Similarity Joins
    Qi, Shuyao
    Bouros, Panagiotis
    Mamoulis, Nikos
    [J]. PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020, 2020,