Efficient Data Structures for Range Shortest Unique Substring Queries

被引:3
|
作者
Abedin, Paniz [1 ]
Ganguly, Arnab [2 ]
Pissis, Solon P. [3 ,4 ]
Thankachan, Sharma V. [1 ]
机构
[1] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA
[2] Univ Wisconsin, Dept Comp Sci, Whitewater, WI 53190 USA
[3] CWI, Life Sci & Hlth, NL-1098 XG Amsterdam, Netherlands
[4] Vrije Univ, Ctr Integrat Bioinformat, NL-1081 HV Amsterdam, Netherlands
基金
美国国家科学基金会;
关键词
shortest unique substring; suffix tree; heavy-light decomposition; range queries; geometric data structures; ALGORITHMS;
D O I
10.3390/a13110276
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Let T[1, n] be a string of length n and T[i, j] be the substring of T starting at position i and ending at position j. A substring T[i, j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [alpha, beta], return a shortest substring T[i, j] of T with exactly one occurrence in [alpha, beta]. We present an O(n log n)-word data structure with O(log(w) n) query time, where w = Omega(log n) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(root n log(c) n) query time, where epsilon > 0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] On Shortest Unique Substring Queries
    Pei, Jian
    Wu, Wush Chi-Hsuan
    Yeh, Mi-Yen
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 937 - 948
  • [2] A Survey on Shortest Unique Substring Queries
    Abedin, Paniz
    Kulekci, M. Oguzhan
    Thankachan, Shama, V
    ALGORITHMS, 2020, 13 (09)
  • [3] Shortest Unique Palindromic Substring Queries in Optimal Time
    Nakashima, Yuto
    Inoue, Hiroe
    Mieno, Takuya
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    COMBINATORIAL ALGORITHMS, IWOCA 2017, 2018, 10765 : 397 - 408
  • [4] On k-Mismatch Shortest Unique Substring Queries Using GPU
    Schultz, Daniel W.
    Xu, Bojian
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2018, 2018, 10847 : 193 - 204
  • [5] An In-place Framework for Exact and Approximate Shortest Unique Substring Queries
    Hon, Wing-Kai
    Thankachan, Sharma V.
    Xu, Bojian
    ALGORITHMS AND COMPUTATION, ISAAC 2015, 2015, 9472 : 755 - 767
  • [6] Shortest Unique Palindromic Substring Queries in Semi-dynamic Settings
    Mieno, Takuya
    Funakoshi, Mitsuru
    COMBINATORIAL ALGORITHMS (IWOCA 2022), 2022, 13270 : 425 - 438
  • [7] Shortest Unique Palindromic Substring Queries on Run-Length Encoded Strings
    Watanabe, Kiichi
    Nakashima, Yuto
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    COMBINATORIAL ALGORITHMS, IWOCA 2019, 2019, 11638 : 430 - 441
  • [8] Shortest Unique Substring Query Revisited
    Ileri, Atalay Mert
    Kulekci, M. Oguzhan
    Xu, Bojian
    COMBINATORIAL PATTERN MATCHING, CPM 2014, 2014, 8486 : 172 - 181
  • [9] A simple yet time-optimal and linear-space algorithm for shortest unique substring queries
    Ileri, Atalay Mert
    Kulekci, M. Oguzhan
    Xu, Bojian
    THEORETICAL COMPUTER SCIENCE, 2015, 562 : 621 - 633
  • [10] Shortest Unique Queries on Strings
    Hu, Xiaocheng
    Pei, Jian
    Tao, Yufei
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 161 - 172