Space-Efficient String Mining under Frequency Constraints

被引:12
|
作者
Fischer, Johannes [1 ]
Makinen, Veli [2 ]
Valimaki, Niko [2 ]
机构
[1] Univ Tubingen, Ctr Bioinformat ZBIT, Sand 14, D-72076 Tubingen, Germany
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
基金
芬兰科学院;
关键词
D O I
10.1109/ICDM.2008.32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Let D-1 and D-2 be two databases (i.e. multisets) of d strings, over an alphabet Sigma, with overall length n. We study the problem of mining discriminative patterns between V, and D-2 - e.g., patterns that are frequent in one database but not in the other emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is O(n log n) bits, which is not optimal for vertical bar Sigma vertical bar << n (in particular for constant vertical bar Sigma vertical bar), as the databases themselves occupy only n log vertical bar Sigma vertical bar bits. Because in many real-life applications space is a more critical resource than time, the aim of this article is to reduce the space, at the cost of an increased running time. In particular, we give a solution for the above problems that uses O(n log vertical bar Sigma vertical bar + d log n) bits, while the time requirement is increased from the optimal linear time to O(n log n). Our new method is tested extensively on a biologically relevant datasets and shown to be usable even on a genome-scale data.
引用
收藏
页码:193 / +
页数:2
相关论文
共 50 条
  • [41] Space-Efficient Substring Occurrence Estimation
    Orlandi, Alessio
    Venturini, Rossano
    ALGORITHMICA, 2016, 74 (01) : 65 - 90
  • [42] Space-Efficient Detection of Unusual Words
    Belazzougui, Djamal
    Cunial, Fabio
    STRING PROCESSING AND INFORMATION RETRIEVAL (SPIRE 2015), 2015, 9309 : 222 - 233
  • [43] Space-efficient algorithm for image rotation
    Asano, Tetsuo
    Bitou, Shinnya
    Motoki, Mitsuo
    Usui, Nobuaki
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2008, E91A (09) : 2341 - 2348
  • [44] Space-Efficient Counting in Graphs on Surfaces
    Mark Braverman
    Raghav Kulkarni
    Sambuddha Roy
    computational complexity, 2009, 18
  • [45] Space-efficient Huffman codes revisited
    Grabowski, Szymon
    Koppl, Dominik
    INFORMATION PROCESSING LETTERS, 2023, 179
  • [46] Space-Efficient Vertex Separators for Treewidth
    Kammer, Frank
    Meintrup, Johannes
    Sajenko, Andrej
    ALGORITHMICA, 2022, 84 (09) : 2414 - 2461
  • [47] A SPACE-EFFICIENT ALGORITHM FOR LOCAL SIMILARITIES
    HUANG, XQ
    HARDISON, RC
    MILLER, W
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1990, 6 (04): : 373 - 381
  • [48] Fast and space-efficient spin sensing
    Hu, Xuedong
    NATURE NANOTECHNOLOGY, 2019, 14 (08) : 735 - 736
  • [49] Space-Efficient Scheduling of Nested Parallelism
    Carnegie Mellon University, Computer Science Department, 5000 Forbes Avenue, Pittsburgh, PA 15213
    ACM Trans Program Lang Syst, 1 (138-173):
  • [50] Fast and space-efficient spin sensing
    Xuedong Hu
    Nature Nanotechnology, 2019, 14 : 735 - 736