THE STATISTICAL DICTIONARY-BASED STRING MATCHING PROBLEM

被引:0
|
作者
Suri, M. [1 ]
Rini, S. [1 ]
机构
[1] Natl Chiao Tung Univ, Elect & Comp Engn Dept, Hsinchu, Taiwan
关键词
Dictionary-based string matching; Content based retrieval; Indexing database; Information retrieval; Phrase searching; SEARCH;
D O I
10.1109/iwcit.2019.8731626
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the Dictionary-based String Matching (DSM) problem, an Information Retrieval (IR) system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the IR system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we define the communication efficiency of the IR system as the average cost in retrieving the entries of the posting list from the posting table, in the limit of an infinitely long source sequence. This formulation is used to study the communication efficiency for the case in which the dictionary is composed of (i) all the strings of a given length, referred to as k-grams , and (ii) run-length codes.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Complete Fix-Free Codes for the Statistical Dictionary-Based String Matching Problem
    Suri, Meer
    Rini, Stefano
    [J]. CONFERENCE RECORD OF THE 2019 FIFTY-THIRD ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, 2019, : 1389 - 1393
  • [2] High Performance Dictionary-Based String Matching for Deep Packet Inspection
    Yang, Yi-Hua E.
    Le, Hoang
    Prasanna, Viktor K.
    [J]. 2010 PROCEEDINGS IEEE INFOCOM, 2010,
  • [3] The String-to-Dictionary Matching Problem
    Klein, Shmuel T.
    Shapira, Dana
    [J]. COMPUTER JOURNAL, 2012, 55 (11): : 1347 - 1356
  • [4] The String-to-Dictionary Matching Problem
    Klein, Shmuel T.
    Shapira, Dana
    [J]. 2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 143 - 152
  • [5] Dictionary-based order-preserving string compression
    Antoshenkov G.
    [J]. The VLDB Journal, 1997, 6 (1) : 26 - 39
  • [6] A dictionary-based compressed pattern matching algorithm
    Ho, MH
    Yen, HC
    [J]. 26TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2002, : 873 - 878
  • [7] Dictionary-Based Statistical Fingerprinting for Indoor Localization
    Kumar, Chirag
    Rajawat, Ketan
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (09) : 8827 - 8841
  • [8] Dictionary-based matching graph network for biomedical named entity recognition
    Lou, Yinxia
    Zhu, Xun
    Tan, Kai
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01):
  • [9] Dictionary-based matching graph network for biomedical named entity recognition
    Yinxia Lou
    Xun Zhu
    Kai Tan
    [J]. Scientific Reports, 13 (1)
  • [10] Chemical entity recognition in patents by combining dictionary-based and statistical approaches
    Akhondi, Saber A.
    Pons, Ewoud
    Afzal, Zubair
    van Haagen, Herman
    Becker, Benedikt F. H.
    Hettne, Kristina M.
    van Mulligen, Erik M.
    Kors, Jan A.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,