THE STATISTICAL DICTIONARY-BASED STRING MATCHING PROBLEM

被引:0
|
作者
Suri, M. [1 ]
Rini, S. [1 ]
机构
[1] Natl Chiao Tung Univ, Elect & Comp Engn Dept, Hsinchu, Taiwan
关键词
Dictionary-based string matching; Content based retrieval; Indexing database; Information retrieval; Phrase searching; SEARCH;
D O I
10.1109/iwcit.2019.8731626
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the Dictionary-based String Matching (DSM) problem, an Information Retrieval (IR) system has access to a source sequence and stores the position of a certain number of strings in a posting table. When a user inquires the position of a string, the IR system, instead of searching in the source sequence directly, relies on the the posting table to answer the query more efficiently. In this paper, the Statistical DSM problem is proposed as a statistical and information-theoretic formulation of the classic DSM problem in which both the source and the query have a statistical description while the strings stored in the posting sequence are described as a code. Through this formulation, we define the communication efficiency of the IR system as the average cost in retrieving the entries of the posting list from the posting table, in the limit of an infinitely long source sequence. This formulation is used to study the communication efficiency for the case in which the dictionary is composed of (i) all the strings of a given length, referred to as k-grams , and (ii) run-length codes.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Recursive Dictionary-Based Simultaneous Orthogonal Matching Pursuit for Sparse Unmixing of Hyperspectral Data
    Kong Fanqiang
    Guo Wenjun
    Shen Qiu
    Wang Dandan
    [J]. Transactions of Nanjing University of Aeronautics and Astronautics, 2017, 34 (04) : 456 - 464
  • [22] Dictionary-based electric properties tomography
    Hampe, Nils
    Herrmann, Max
    Amthor, Thomas
    Findeklee, Christian
    Doneva, Mariya
    Katscher, Ulrich
    [J]. MAGNETIC RESONANCE IN MEDICINE, 2019, 81 (01) : 342 - 349
  • [23] DESIGN AND IMPLEMENTATION OF A DICTIONARY-BASED ARCHIVER
    Radescu, Radu
    [J]. UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2008, 70 (03): : 21 - 28
  • [24] Dictionary-based methods for information extraction
    Baronchelli, A
    Caglioti, E
    Loreto, V
    Pizzi, E
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2004, 342 (1-2) : 294 - 300
  • [25] Dictionary-Based Low-Rank Approximations and the Mixed Sparse Coding Problem
    Cohen, Jeremy E.
    [J]. FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2022, 8
  • [26] Design and implementation of a dictionary-based archiver
    Dept. of Applied Electronics and Information Engineering, University Politehnica of Bucharest, Bucharest, Romania
    [J]. UPB Sci. Bull. Ser. C Electr. Eng., 2008, 3 (21-28):
  • [27] Dictionary-based compressive Fourier ptychography
    Li, Xianye
    Li, Li
    Liu, Xiaoli
    He, Wenqi
    Tang, Qijian
    Han, Sen
    Peng, Xiang
    [J]. OPTICS LETTERS, 2022, 47 (09) : 2314 - 2317
  • [28] Dictionary-Based DGAs Variants Detection
    Mahmood, Raja Azlina Raja
    Abdullah, Azizol
    Hussin, Masnida
    Udzir, Nur Izura
    [J]. ADVANCES ON INTELLIGENT INFORMATICS AND COMPUTING: HEALTH INFORMATICS, INTELLIGENT SYSTEMS, DATA SCIENCE AND SMART COMPUTING, 2022, 127 : 258 - 269
  • [29] A dictionary-based approach for gene annotation
    Pachter, L
    Batzoglou, S
    Spitkovsky, VI
    Banks, E
    Lander, ES
    Kleitman, DJ
    Berger, B
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) : 419 - 430
  • [30] Target Recognition in Radar Images Using Weighted Statistical Dictionary-Based Sparse Representation
    Karine, Ayoub
    Toumi, Abdelmalek
    Khenchaf, Ali
    El Hassouni, Mohammed
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (12) : 2403 - 2407