Design of Fast Multiple String Searching Based on Improved Prefix Tree

被引:1
|
作者
Cheng, Yu [1 ]
Zhang, Tao [2 ]
机构
[1] Tsinghua Univ, Dept Biomed Engn, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
Multi-string matching; prefix tree; string pattern;
D O I
10.1109/WKDD.2010.138
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-string matching is one of the most important components in data mining task. New applications in many technology fields require high performance string matching algorithms. This paper first presents a new string searching approach based on a data structure called prefix tree. The innovative algorithm eliminates the functional overlap of the table HASH and Prefix Function. Then we make a little improvement on the prefix tree and present a second algorithm that is faster and more space-saving. It is demonstrated analytically that the two algorithms inherit the optimality and are very competitive in practice. On tests of both real life and synthetic data, our algorithms are also efficient and especially effective for various string pattern and large alphabet sets.
引用
收藏
页码:111 / 114
页数:4
相关论文
共 50 条
  • [21] Stringlish: improved English string searching in binary files
    Aycock, J.
    SOFTWARE-PRACTICE & EXPERIENCE, 2015, 45 (11): : 1591 - 1595
  • [22] Robust and Fast Phonetic String Matching Method for Lyric Searching Based on Acoustic Distance
    Xu, Xin
    Kato, Tsuneo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (09): : 2501 - 2509
  • [23] Improved Hexagon-based Searching Algorithm for Fast Motion Estimation
    He, Wenwei
    Zhang, Yuling
    2010 6TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS NETWORKING AND MOBILE COMPUTING (WICOM), 2010,
  • [24] An improved prefix labeling scheme: A binary string approach for dynamic ordered XML
    Li, CQ
    Ling, TW
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 125 - 137
  • [25] MRCSI: Compressing and Searching String Collections with Multiple References
    Wandelt, Sebastian
    Leser, Ulf
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (05): : 461 - 472
  • [26] Fast string matching for multiple searches
    Fenwick, P
    SOFTWARE-PRACTICE & EXPERIENCE, 2001, 31 (09): : 815 - 833
  • [27] Multiple Item Support Constraints Based Frequent Pattern Mining Using Dynamic Prefix Tree
    Biswas, Sudarsan
    Saha, Diganta
    Pandit, Rajat
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2025, 33 (02) : 143 - 172
  • [28] An Improved Fast Decision Tree Algorithm Based on Attribute Deviation
    Liu, Dan
    Zhang, Yue
    Sui, Xin
    Zeng, Yan
    Wang, Huan
    Li, Li
    FUZZY SYSTEMS AND DATA MINING III (FSDM 2017), 2017, 299 : 441 - 446
  • [29] A Pivotal Prefix Based Filtering Algorithm for String Similarity Search
    Deng, Dong
    Li, Guoliang
    Feng, Jianhua
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 673 - 684
  • [30] Multi-relational Sequence Pattern Mining Method Based on Improved Prefix Tree in the Star Model
    Bao, Wenyan
    Yin, Jiang
    Li, Chen
    Zhang, Yinjuan
    Li, Yun
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 435 - 439