Space-Efficient String Mining under Frequency Constraints

被引：12

作者：

Fischer, Johannes ^{[1
]}

Makinen, Veli ^{[2
]}

Valimaki, Niko ^{[2
]}

机构：

[1] Univ Tubingen, Ctr Bioinformat ZBIT, Sand 14, D-72076 Tubingen, Germany

[2] Univ Helsinki, Dept Comp Sci, Helsinki, Finland

来源：

ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2008年

基金：

芬兰科学院;

关键词：

D O I：

10.1109/ICDM.2008.32

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Let D-1 and D-2 be two databases (i.e. multisets) of d strings, over an alphabet Sigma, with overall length n. We study the problem of mining discriminative patterns between V, and D-2 - e.g., patterns that are frequent in one database but not in the other emerging patterns, or patterns satisfying other frequency-related constraints. Using the algorithmic framework by Hui (CPM 1992), one can solve several variants of this problem in the optimal linear time with the aid of suffix trees or suffix arrays. This stands in high contrast to other pattern domains such as itemsets or subgraphs, where super-linear lower bounds are known. However, the space requirement of existing solutions is O(n log n) bits, which is not optimal for vertical bar Sigma vertical bar << n (in particular for constant vertical bar Sigma vertical bar), as the databases themselves occupy only n log vertical bar Sigma vertical bar bits. Because in many real-life applications space is a more critical resource than time, the aim of this article is to reduce the space, at the cost of an increased running time. In particular, we give a solution for the above problems that uses O(n log vertical bar Sigma vertical bar + d log n) bits, while the time requirement is increased from the optimal linear time to O(n log n). Our new method is tested extensively on a biologically relevant datasets and shown to be usable even on a genome-scale data.

引用

页码：193 / +

页数：2

共 50 条

[21] Space-Efficient Manifest Contracts
Greenberg, Michael
ACM SIGPLAN NOTICES, 2015, 50 (01) : 181 - 194
[22] Space-efficient gradual typing
Herman D.
Tomb A.
Flanagan C.
Higher-Order and Symbolic Computation, 2010, 23 (02) : 167 - 189
[23] SPACE-EFFICIENT PARALLEL MERGING
KATAJAINEN, J
LEVCOPOULOS, C
PETERSSON, O
RAIRO-INFORMATIQUE THEORIQUE ET APPLICATIONS-THEORETICAL INFORMATICS AND APPLICATIONS, 1993, 27 (04): : 295 - 310
[24] Space-Efficient Latent Contracts
Greenberg, Michael
TRENDS IN FUNCTIONAL PROGRAMMING (TFP 2016), 2019, 10447 : 3 - 23
[25] The Space-Efficient Core of Vadalog
Berger, Gerald
Gottlob, Georg
Pieris, Andreas
Sallinger, Emanuel
PROCEEDINGS OF THE 38TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS (PODS '19), 2019, : 270 - 284
[26] Space-efficient search algorithms
Korf, RE
ACM COMPUTING SURVEYS, 1995, 27 (03) : 337 - 339
[27] Space-Efficient Informational Redundancy
Glasser, Christian
ALGORITHMS AND COMPUTATION, PROCEEDINGS, 2008, 5369 : 448 - 459
[28] SPACE-EFFICIENT PARALLEL MERGING
KATAJAINEN, J
LEVCOPOULOS, C
PETERSSON, O
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 605 : 37 - 49
[29] Space-Efficient Graph Kernelizations
Kammer, Frank
Sajenko, Andrej
THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, TAMC 2024, 2024, 14637 : 260 - 271
[30] The Space-Efficient Core of Vadalog
Berger, Gerald
Gottlob, Georg
Pieris, Andreas
Sallinger, Emanuel
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2022, 47 (01):

← 1 2 3 4 5 →