ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

被引:9
|
作者
Sahli, Majed [1 ]
Mansour, Essam [2 ]
Kalnis, Panos [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Thuwal, Saudi Arabia
[2] Qatar Comp Res Inst, Doha, Qatar
来源
VLDB JOURNAL | 2014年 / 23卷 / 06期
关键词
Automatic tuning; Cache efficient; Cloud; Elastic; Motif; Suffix tree; SUFFIX TREE; MOTIFS; DISCOVERY; EFFICIENT; CONSTRUCTION;
D O I
10.1007/s00778-014-0370-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern applications, including bioinformatics, time series, and web log analysis, require the extraction of frequent patterns, called motifs, from one very long (i.e., several gigabytes) sequence. Existing approaches are either heuristics that are error-prone, or exact (also called combinatorial) methods that are extremely slow, therefore, applicable only to very small sequences (i.e., in the order of megabytes). This paper presents ACME, a combinatorial approach that scales to gigabyte-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency, because of the related overheads. To this end, ACME introduces an automatic tuning mechanism that suggests the appropriate number of CPUs to utilize, in order to meet the user constraints in terms of run time, while minimizing the financial cost of cloud resources. Our experiments show that, compared to the state of the art, ACME supports three orders of magnitude longer sequences (e.g., DNA for the entire human genome); handles large alphabets (e.g., English alphabet for Wikipedia); scales out to 16,384 CPUs on a supercomputer; and supports elastic deployment in the cloud.
引用
收藏
页码:871 / 893
页数:23
相关论文
共 50 条
  • [21] Extracting Log Patterns from System Logs in LARGE
    Zhao, Yining
    Xiao, Haili
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 1645 - 1652
  • [22] Extracting Promising Sequential Patterns from RFID Data Using the LCM Sequence
    Nakahara, Takanobu
    Uno, Takeaki
    Yada, Katsutoshi
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT III, 2010, 6278 : 244 - +
  • [23] Parallel bit stream correlation system for very long baseline interferometry
    Kiuchi, H
    [J]. RADIO SCIENCE, 2005, 40 (05) : 1 - 11
  • [24] A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets
    Appice, Annalisa
    Ceci, Michelangelo
    Turi, Antonio
    Malerba, Donato
    [J]. INTELLIGENT DATA ANALYSIS, 2011, 15 (01) : 69 - 88
  • [25] Sequence Parallelism: Long Sequence Training from System Perspective
    Li, Shenggui
    Xue, Fuzhao
    Baranwal, Chaitanya
    Li, Yongbin
    You, Yang
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2391 - 2404
  • [26] Top-down mining of frequent closed patterns from very high dimensional data
    Liu, Hongyan
    Wang, Xiaoyu
    He, Jun
    Han, Jiawei
    Xin, Dong
    Shao, Zheng
    [J]. INFORMATION SCIENCES, 2009, 179 (07) : 899 - 924
  • [27] A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large Databases
    Adnan, Muhaimenul
    Alhajj, Reda
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01): : 154 - 172
  • [28] Extracting long-term patterns of population changes from sporadic counts of migrant birds
    Flemming, Joanna Mills
    Cantoni, Eva
    Field, Christopher
    McLaren, Ian
    [J]. ENVIRONMETRICS, 2010, 21 (05) : 482 - 492
  • [29] Linear and sublinear time algorithms for mining frequent traversal path patterns from very large web logs
    Chen, ZX
    Fowler, RH
    Fu, AWC
    Wang, CY
    [J]. SEVENTH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2003, : 117 - 122
  • [30] Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach
    Liu, Hongyan
    Han, Jiawei
    Xin, Dong
    Shao, Zheng
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 282 - +