Mining approximate patterns with frequent locally optimal occurrences

被引:3
|
作者
Nakamura, Atsuyoshi [1 ]
Takigawa, Ichigaku [1 ]
Tosaka, Hisashi [2 ]
Kudo, Mineichi [1 ]
Mamitsuka, Hiroshi [3 ]
机构
[1] Hokkaido Univ, Kita Ku, Kita 14,Nishi 9, Sapporo, Hokkaido 0600814, Japan
[2] NS Solut Corp, Tokyo, Japan
[3] Kyoto Univ, Inst Chem Res, Uji, Kyoto 6110011, Japan
关键词
Alignment; Frequent pattern mining; String; Ordered tree; DNA; SEQUENTIAL PATTERNS; EFFICIENT; REPEATS; IDENTIFICATION; ALGORITHMS; DISCOVERY; FAMILIES;
D O I
10.1016/j.dam.2015.07.002
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
We consider a frequent approximate pattern mining problem, in which interspersed repetitive regions are extracted from a given string. That is, we enumerate substrings that frequently match substrings of a given string locally and optimally. For this problem, we propose a new algorithm, in which candidate patterns are generated without duplication using the suffix tree of a given string. We further define a k-gap-constrained setting, in which the number of gaps in the alignment between a pattern and an occurrence is limited to at most k. Under this setting, we present memory-efficient algorithms, particularly a candidate-based version, which runs fast enough even over human chromosome sequences with, more than 10 million nucleotides. We note that our problem and algorithms for strings can be directly extended to ordered labeled trees. In our experiments we used both randomly synthesized strings, in which corrupted similar substrings are embedded, and real data of human chromosome. The synthetic data experiments show that our proposed approach extracted embedded patterns correctly and time-efficiently. In real data experiments, we examined the centers of 100 clusters computed after grouping the patterns obtained by our k-gap-constrained versions (k = 0, 1 and 2) and the results revealed that the regions of their occurrences coincided with around a half of the regions automatically annotated as Alu sequences by a manually curated repeat sequence database. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:123 / 152
页数:30
相关论文
共 50 条
  • [21] Mining XML frequent query patterns
    Hua, Cheng
    Zhao, Hai-jun
    Chen, Yi
    [J]. INTEGRATION AND INNOVATION ORIENT TO E-SOCIETY, VOL 1, 2007, 251 : 26 - +
  • [22] Mining Frequent Patterns with Counting Quantifiers
    He, Yanxiao
    Wang, Xin
    Sha, Yuji
    Zhong, Xueyan
    Fang, Yu
    [J]. WEB AND BIG DATA, PT I, APWEB-WAIM 2022, 2023, 13421 : 372 - 381
  • [23] Methods for mining frequent sequential patterns
    Jiang, LH
    Hamilton, HJ
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 486 - 491
  • [24] Mining Frequent Patterns with Differential Privacy
    Bonomi, Luca
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12): : 1422 - 1427
  • [25] Mining Frequent Patterns on Knowledge Graphs
    Mouatadid, Lalla
    [J]. WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1647 - 1647
  • [26] Frequent Patterns Mining in DNA Sequence
    Deng, Na
    Chen, Xu
    Li, Desheng
    Xiong, Caiquan
    [J]. IEEE ACCESS, 2019, 7 : 108400 - 108410
  • [27] Mining Frequent Patterns in Evolving Graphs
    Aslay, Cigdem
    Nasir, Muhammad Anis Uddin
    Morales, Gianmarco De Francisci
    Gionis, Aristides
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 923 - 932
  • [28] Mining Frequent Composite Service Patterns
    Meng, Hui
    Wu, Lifa
    Zhang, Tianlei
    Chen, Guisheng
    Li, Deyi
    [J]. GCC 2008: SEVENTH INTERNATIONAL CONFERENCE ON GRID AND COOPERATIVE COMPUTING, PROCEEDINGS, 2008, : 713 - +
  • [29] Image Clustering Based on Frequent Approximate Subgraph Mining
    Acosta-Mendoza, Niusvel
    Ariel Carrasco-Ochoa, Jesus
    Martinez-Trinidad, Jose Fco.
    Gago-Alonso, Andres
    Medina-Pagola, Jose E.
    [J]. PATTERN RECOGNITION, 2018, 10880 : 189 - 198
  • [30] Mining approximate closed frequent itemsets over stream
    Li, Haifeng
    Lu, Zongjian
    Chen, Hong
    [J]. PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 405 - 410