Space-Efficient String Mining under Frequency Constraints

被引:12
|
作者
Fischer, Johannes [1 ]
Makinen, Veli [2 ]
Valimaki, Niko [2 ]
机构
[1] Univ Tubingen, Ctr Bioinformat ZBIT, Sand 14, D-72076 Tubingen, Germany
[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland
基金
芬兰科学院;
关键词
D O I
10.1109/ICDM.2008.32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Let D-1 and D-2 be two databases (i.e. multisets) of d strings, over an alphabet Sigma, with overall length n. We study the problem of mining discriminative patterns between V, and D-2 - e.g., patterns that are frequent in one database but not in the other emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is O(n log n) bits, which is not optimal for vertical bar Sigma vertical bar << n (in particular for constant vertical bar Sigma vertical bar), as the databases themselves occupy only n log vertical bar Sigma vertical bar bits. Because in many real-life applications space is a more critical resource than time, the aim of this article is to reduce the space, at the cost of an increased running time. In particular, we give a solution for the above problems that uses O(n log vertical bar Sigma vertical bar + d log n) bits, while the time requirement is increased from the optimal linear time to O(n log n). Our new method is tested extensively on a biologically relevant datasets and shown to be usable even on a genome-scale data.
引用
收藏
页码:193 / +
页数:2
相关论文
共 50 条
  • [31] Space-efficient informational redundancy
    Glasser, Christian
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2010, 76 (08) : 792 - 811
  • [32] Space-efficient outlines from image data via vertex minimization and grid constraints
    Hobby, JD
    GRAPHICAL MODELS AND IMAGE PROCESSING, 1997, 59 (02): : 73 - 88
  • [33] Uniform frequency images: Adding geometry to images to produce space-efficient textures
    Hunter, A
    Cohen, JD
    VISUALIZATION 2000, PROCEEDINGS, 2000, : 243 - 250
  • [34] Space-efficient bounded model checking
    Katz, J
    Hanna, Z
    Dershowitz, N
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 686 - 687
  • [35] Space-Efficient SHARC-Routing
    Brunel, Edith
    Delling, Daniel
    Gemsa, Andreas
    Wagner, Dorothea
    EXPERIMENTAL ALGORITHMS, PROCEEDINGS, 2010, 6049 : 47 - 58
  • [36] Space-efficient Basic Graph Algorithms
    Elmasry, Amr
    Hagerup, Torben
    Kammer, Frank
    32ND INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE (STACS 2015), 2015, 30 : 288 - 301
  • [37] SPACE-EFFICIENT STATIC TREES AND GRAPHS
    JACOBSON, G
    30TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 1989, : 549 - 554
  • [38] Space-Efficient Approximations for Subset Sum
    Gal, Anna
    Jang, Jing-Tang
    Limaye, Nutan
    Mahajan, Meena
    Sreenivasaiah, Karteek
    ACM TRANSACTIONS ON COMPUTATION THEORY, 2016, 8 (04)
  • [39] Space-efficient algorithms for document retrieval
    Valimaki, Niko
    Makinen, Veli
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2007, 4580 : 205 - +
  • [40] Space-efficient scheduling of multithreaded computations
    Blumofe, RD
    Leiserson, CE
    SIAM JOURNAL ON COMPUTING, 1998, 27 (01) : 202 - 229