Text sparsification via local maxima

被引:3
|
作者
Crescenzi, P
Del Lungo, A
Grossi, R
Lodi, E
Pagli, L
Rossi, G
机构
[1] Univ Siena, Dipartimento Matemat, I-53100 Siena, Italy
[2] Univ Florence, Dipartimento Sistemi & Informat, I-50134 Florence, Italy
[3] Univ Pisa, Dipartimento Informat, I-56125 Pisa, Italy
[4] Univ Roma Tor Vergata, Dipartimento Matemat, I-00133 Rome, Italy
关键词
computational complexity; NP-completeness; pattern matching; string algorithms; text indexing data structures;
D O I
10.1016/S0304-3975(03)00142-7
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
in this paper we investigate some properties and algorithms related to a text sparsification technique based on the identification of local maxima in the given string. As the number of local maxima depends on the order assigned to the alphabet symbols, we first consider the case in which the order can be chosen in an arbitrary way. We show that looking for an order that minimizes the number of local maxima in the given text string is an NP-hard problem. Then, we consider the case in which the order is fixed a priori. Even though the order is not necessarily optimal, we can exploit the property that the average number of local maxima induced by the order in an arbitrary text is approximately one third of the text length. In particular, we describe how to iterate the process of selecting the local maxima by one or more iterations, so as to obtain a sparsified text. We show how to use this technique to filter the access to unstructured texts, which appear to have no natural division in words. Finally, we experimentally show that our approach can be successfully used in order to create a space efficient index for searching sufficiently long patterns in a DNA sequence as quickly as a full index. (C) 2003 Elsevier B.V. All rights reserved.
引用
收藏
页码:341 / 364
页数:24
相关论文
共 50 条
  • [1] Text sparsification via local maxima
    Crescenzi, P
    Del Lungo, A
    Grossi, R
    Lodi, E
    Pagli, L
    Rossi, G
    [J]. FST TCS 2000: FOUNDATIONS OF SOFTWARE TECHNOLOGY AND THEORETICAL COMPUTER SCIENCE, PROCEEDINGS, 2000, 1974 : 290 - 301
  • [2] Improved Local Computation Algorithm for Set Cover via Sparsification
    Grunau, Christoph
    Mitrovic, Slobodan
    Rubinfeld, Ronitt
    Vakilian, Ali
    [J]. PROCEEDINGS OF THE THIRTY-FIRST ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS (SODA'20), 2020, : 2993 - 3011
  • [3] Improved Local Computation Algorithm for Set Cover via Sparsification
    Grunau, Christoph
    Mitrovic, Slobodan
    Rubinfeld, Ronitt
    Vakilian, Ali
    [J]. PROCEEDINGS OF THE 2020 ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, SODA, 2020, : 2993 - 3011
  • [4] Local Maxima in ADCOPs via Side Payments
    Vaknin, Yair
    Meisels, Amnon
    [J]. 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2021), 2021, : 376 - 381
  • [5] Improving Short Text Clustering by Similarity Matrix Sparsification
    Rakib, Md Rashadul Hasan
    Jankowska, Magdalena
    Zeh, Norbert
    Milios, Evangelos
    [J]. PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 2018), 2018,
  • [6] Finding text in images via local thresholding
    Gllavata, J
    Ewerth, R
    Freisleben, B
    [J]. PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, 2003, : 539 - 542
  • [7] Spectral hypergraph sparsification via chaining
    Computer Science & Engineering, University of Washington, United States
    [J]. arXiv,
  • [8] Spectral Hypergraph Sparsification via Chaining
    Lee, James R.
    [J]. PROCEEDINGS OF THE 55TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING, STOC 2023, 2023, : 207 - 218
  • [9] Effective Tensor Sketching via Sparsification
    Xia, Dong
    Yuan, Ming
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (02) : 1356 - 1369
  • [10] ON SPECTROGRAM LOCAL MAXIMA
    Flandrin, Patrick
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 3979 - 3983