Text sparsification via local maxima

被引:3
|
作者
Crescenzi, P
Del Lungo, A
Grossi, R
Lodi, E
Pagli, L
Rossi, G
机构
[1] Univ Siena, Dipartimento Matemat, I-53100 Siena, Italy
[2] Univ Florence, Dipartimento Sistemi & Informat, I-50134 Florence, Italy
[3] Univ Pisa, Dipartimento Informat, I-56125 Pisa, Italy
[4] Univ Roma Tor Vergata, Dipartimento Matemat, I-00133 Rome, Italy
关键词
computational complexity; NP-completeness; pattern matching; string algorithms; text indexing data structures;
D O I
10.1016/S0304-3975(03)00142-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
in this paper we investigate some properties and algorithms related to a text sparsification technique based on the identification of local maxima in the given string. As the number of local maxima depends on the order assigned to the alphabet symbols, we first consider the case in which the order can be chosen in an arbitrary way. We show that looking for an order that minimizes the number of local maxima in the given text string is an NP-hard problem. Then, we consider the case in which the order is fixed a priori. Even though the order is not necessarily optimal, we can exploit the property that the average number of local maxima induced by the order in an arbitrary text is approximately one third of the text length. In particular, we describe how to iterate the process of selecting the local maxima by one or more iterations, so as to obtain a sparsified text. We show how to use this technique to filter the access to unstructured texts, which appear to have no natural division in words. Finally, we experimentally show that our approach can be successfully used in order to create a space efficient index for searching sufficiently long patterns in a DNA sequence as quickly as a full index. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:341 / 364
页数:24
相关论文
共 50 条
  • [21] Edge sparsification for graphs via meta-learning
    Wan, Guihong
    Schweitzer, Haim
    [J]. Proceedings - International Conference on Data Engineering, 2021, 2021-April : 2733 - 2738
  • [22] Distributed Symmetry Breaking on Power Graphs via Sparsification
    Maus, Yannic
    Peltonen, Saku
    Uitto, Jara
    [J]. PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING, PODC 2023, 2023, : 157 - 167
  • [23] Sparsification of SAT and CSP Problems via Tractable Extensions
    Lagerkvist, Victor
    Wahlstrom, Magnus
    [J]. ACM TRANSACTIONS ON COMPUTATION THEORY, 2020, 12 (02)
  • [24] Compressing Dictionary Matching Index via Sparsification Technique
    Wing-Kai Hon
    Tsung-Han Ku
    Tak-Wah Lam
    Rahul Shah
    Siu-Lung Tam
    Sharma V. Thankachan
    Jeffrey Scott Vitter
    [J]. Algorithmica, 2015, 72 : 515 - 538
  • [25] SPARSIFICATION VIA COMPRESSED SENSING FOR AUTOMATIC SPEECH RECOGNITION
    Zhen, Kai
    Hieu Duy Nguyen
    Chang, Feng-Ju
    Mouchtaris, Athanasios
    Rastrow, Ariya
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6009 - 6013
  • [26] Compressing Dictionary Matching Index via Sparsification Technique
    Hon, Wing-Kai
    Ku, Tsung-Han
    Lam, Tak-Wah
    Shah, Rahul
    Tam, Siu-Lung
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    [J]. ALGORITHMICA, 2015, 72 (02) : 515 - 538
  • [27] Solving a Tropical Optimization Problem via Matrix Sparsification
    Krivulin, Nikolai
    [J]. RELATIONAL AND ALGEBRAIC METHODS IN COMPUTER SCIENCE (RAMICS 2015), 2015, 9348 : 326 - 343
  • [28] SparRL: Graph Sparsification via Deep Reinforcement Learning
    Wickman, Ryan
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2521 - 2523
  • [29] Edge Sparsification for Graphs via Meta-Learning
    Wan, Guihong
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2733 - 2738
  • [30] FROM LOCAL MAXIMA TO CONNECTED SKELETONS
    ARCELLI, C
    CORDELLA, LP
    LEVIALDI, S
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1981, 3 (02) : 134 - 143