Space-Efficient String Mining under Frequency Constraints

被引:12
|
作者
Fischer, Johannes [1 ]
Makinen, Veli [2 ]
Valimaki, Niko [2 ]
机构
[1] Univ Tubingen, Ctr Bioinformat ZBIT, Sand 14, D-72076 Tubingen, Germany
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
基金
芬兰科学院;
关键词
D O I
10.1109/ICDM.2008.32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Let D-1 and D-2 be two databases (i.e. multisets) of d strings, over an alphabet Sigma, with overall length n. We study the problem of mining discriminative patterns between V, and D-2 - e.g., patterns that are frequent in one database but not in the other emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is O(n log n) bits, which is not optimal for vertical bar Sigma vertical bar << n (in particular for constant vertical bar Sigma vertical bar), as the databases themselves occupy only n log vertical bar Sigma vertical bar bits. Because in many real-life applications space is a more critical resource than time, the aim of this article is to reduce the space, at the cost of an increased running time. In particular, we give a solution for the above problems that uses O(n log vertical bar Sigma vertical bar + d log n) bits, while the time requirement is increased from the optimal linear time to O(n log n). Our new method is tested extensively on a biologically relevant datasets and shown to be usable even on a genome-scale data.
引用
收藏
页码:193 / +
页数:2
相关论文
共 50 条
  • [21] Space-Efficient Manifest Contracts
    Greenberg, Michael
    ACM SIGPLAN NOTICES, 2015, 50 (01) : 181 - 194
  • [22] Space-efficient gradual typing
    Herman D.
    Tomb A.
    Flanagan C.
    Higher-Order and Symbolic Computation, 2010, 23 (02) : 167 - 189
  • [23] SPACE-EFFICIENT PARALLEL MERGING
    KATAJAINEN, J
    LEVCOPOULOS, C
    PETERSSON, O
    RAIRO-INFORMATIQUE THEORIQUE ET APPLICATIONS-THEORETICAL INFORMATICS AND APPLICATIONS, 1993, 27 (04): : 295 - 310
  • [24] Space-Efficient Latent Contracts
    Greenberg, Michael
    TRENDS IN FUNCTIONAL PROGRAMMING (TFP 2016), 2019, 10447 : 3 - 23
  • [25] The Space-Efficient Core of Vadalog
    Berger, Gerald
    Gottlob, Georg
    Pieris, Andreas
    Sallinger, Emanuel
    PROCEEDINGS OF THE 38TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '19), 2019, : 270 - 284
  • [26] Space-efficient search algorithms
    Korf, RE
    ACM COMPUTING SURVEYS, 1995, 27 (03) : 337 - 339
  • [27] Space-Efficient Informational Redundancy
    Glasser, Christian
    ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2008, 5369 : 448 - 459
  • [28] SPACE-EFFICIENT PARALLEL MERGING
    KATAJAINEN, J
    LEVCOPOULOS, C
    PETERSSON, O
    LECTURE NOTES IN COMPUTER SCIENCE, 1992, 605 : 37 - 49
  • [29] Space-Efficient Graph Kernelizations
    Kammer, Frank
    Sajenko, Andrej
    THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, TAMC 2024, 2024, 14637 : 260 - 271
  • [30] The Space-Efficient Core of Vadalog
    Berger, Gerald
    Gottlob, Georg
    Pieris, Andreas
    Sallinger, Emanuel
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2022, 47 (01):