Mining approximate patterns with frequent locally optimal occurrences

被引：3

作者：

Nakamura, Atsuyoshi ^{[1
]}

Takigawa, Ichigaku ^{[1
]}

Tosaka, Hisashi ^{[2
]}

Kudo, Mineichi ^{[1
]}

Mamitsuka, Hiroshi ^{[3
]}

机构：

[1] Hokkaido Univ, Kita Ku, Kita 14,Nishi 9, Sapporo, Hokkaido 0600814, Japan

[2] NS Solut Corp, Tokyo, Japan

[3] Kyoto Univ, Inst Chem Res, Uji, Kyoto 6110011, Japan

来源：

DISCRETE APPLIED MATHEMATICS | 2016年 / 200卷

关键词：

Alignment; Frequent pattern mining; String; Ordered tree; DNA; SEQUENTIAL PATTERNS; EFFICIENT; REPEATS; IDENTIFICATION; ALGORITHMS; DISCOVERY; FAMILIES;

D O I：

10.1016/j.dam.2015.07.002

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k-gap-constrained setting, in which the number of gaps in the alignment between a pattern and an occurrence is limited to at most k. Under this setting, we present memory-efficient algorithms, particularly a candidate-based version, which runs fast enough even over human chromosome sequences with, more than 10 million nucleotides. We note that our problem and algorithms for strings can be directly extended to ordered labeled trees. In our experiments we used both randomly synthesized strings, in which corrupted similar substrings are embedded, and real data of human chromosome. The synthetic data experiments show that our proposed approach extracted embedded patterns correctly and time-efficiently. In real data experiments, we examined the centers of 100 clusters computed after grouping the patterns obtained by our k-gap-constrained versions (k = 0, 1 and 2) and the results revealed that the regions of their occurrences coincided with around a half of the regions automatically annotated as Alu sequences by a manually curated repeat sequence database. (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：123 / 152

页数：30

共 50 条

[1] Approximate mining of frequent patterns on streams
Silvestri, Claudio
Orlando, Salvatore
[J]. INTELLIGENT DATA ANALYSIS, 2007, 11 (01) : 49 - 73
[2] Mining frequent approximate patterns in large networks
Driss, Kaouthar
Boulila, Wadii
Leborgne, Aurelie
Gancarski, Pierre
[J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2021, 31 (03) : 1265 - 1279
[3] Mining Approximate Frequent Patterns From Noisy Databases
Yu, Xiaomei
Li, Yongqin
Wang, Hong
[J]. 2015 10TH INTERNATIONAL CONFERENCE ON BROADBAND AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2015), 2015, : 400 - 403
[4] gApprox: Mining frequent approximate patterns from a massive network
Chen, Chen
Yan, Xifeng
Zhu, Feida
Han, Jiawei
[J]. ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 445 - +
[5] MANIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling
Preti, Giulia
Morales, Gianmarco De Francisci
Riondato, Matteo
[J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (03)
[6] MANIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling
Preti, Giulia
Morales, Gianmarco De Francisci
Riondato, Matteo
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1348 - 1358
[7] Efficient approximate mining of frequent patterns over transactional data streams
Ng, Willie
Dash, Manoranjan
[J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 241 - 250
[8] Recommending Optimal API Orchestration with Mining Frequent Mashup Patterns
Peng, Dunlu
Xie, Lei
Kai, Duan
Li, Feitian
[J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2014, 7 (03): : 233 - 250
[9] TIPTAP: Approximate Mining of Frequent k-Subgraph Patterns in Evolving Graphs
Nasir, Muhammad Anis Uddin
Aslay, Cigdem
Morales, Gianmarco De Francisci
Riondato, Matteo
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (03)
[10] An approximate approach to frequent itemset mining
Zhang, Chunkai
Zhang, Xudong
Tian, Panbo
[J]. 2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 68 - 73

← 1 2 3 4 5 →