ACME: A scalable parallel system for extracting frequent patterns from a very long sequence

被引:9
|
作者
Sahli, Majed [1 ]
Mansour, Essam [2 ]
Kalnis, Panos [1 ]
机构
[1] King Abdullah Univ Sci & Technol, Thuwal, Saudi Arabia
[2] Qatar Comp Res Inst, Doha, Qatar
来源
VLDB JOURNAL | 2014年 / 23卷 / 06期
关键词
Automatic tuning; Cache efficient; Cloud; Elastic; Motif; Suffix tree; SUFFIX TREE; MOTIFS; DISCOVERY; EFFICIENT; CONSTRUCTION;
D O I
10.1007/s00778-014-0370-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern applications, including bioinformatics, time series, and web log analysis, require the extraction of frequent patterns, called motifs, from one very long (i.e., several gigabytes) sequence. Existing approaches are either heuristics that are error-prone, or exact (also called combinatorial) methods that are extremely slow, therefore, applicable only to very small sequences (i.e., in the order of megabytes). This paper presents ACME, a combinatorial approach that scales to gigabyte-long sequences and is the first to support supermaximal motifs. ACME is a versatile parallel system that can be deployed on desktop multi-core systems, or on thousands of CPUs in the cloud. However, merely using more compute nodes does not guarantee efficiency, because of the related overheads. To this end, ACME introduces an automatic tuning mechanism that suggests the appropriate number of CPUs to utilize, in order to meet the user constraints in terms of run time, while minimizing the financial cost of cloud resources. Our experiments show that, compared to the state of the art, ACME supports three orders of magnitude longer sequences (e.g., DNA for the entire human genome); handles large alphabets (e.g., English alphabet for Wikipedia); scales out to 16,384 CPUs on a supercomputer; and supports elastic deployment in the cloud.
引用
收藏
页码:871 / 893
页数:23
相关论文
共 50 条
  • [1] ACME: A scalable parallel system for extracting frequent patterns from a very long sequence
    Majed Sahli
    Essam Mansour
    Panos Kalnis
    [J]. The VLDB Journal, 2014, 23 : 871 - 893
  • [2] RACE: A Scalable and Elastic Parallel System for Discovering Repeats in Very Long Sequences
    Mansour, Essam
    El-Roby, Ahmed
    Kalnis, Panos
    Ahmadia, Aron
    Aboulnaga, Ashraf
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (10): : 865 - 876
  • [3] On anti-monotone frequency measures for extracting sequential patterns from a single very-long data sequence
    Iwanuma, K
    Takano, Y
    Nabeshima, H
    [J]. 2004 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2004, : 213 - 217
  • [4] Efficient mining of long frequent patterns from very large dense datasets
    Gopalan, RP
    Sucahyo, YG
    [J]. DESIGN AND APPLICATION OF HYBRID INTELLIGENT SYSTEMS, 2003, 104 : 652 - 661
  • [5] Scalable parallel algorithm for mining frequent patterns on message passing multiprocessor systems
    Javed, A
    Khokhar, A
    [J]. PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 2003, : 157 - 162
  • [6] Efficiently extracting frequent patterns from continuous uncertain data
    Liu, Chuan-Ming
    Niu, Zhendong
    Liao, Kuan-Teng
    [J]. JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2019, 42 (03) : 225 - 235
  • [7] Extracting paraphrase patterns from bilingual parallel corpora
    Zhao, Shiqi
    Wang, Haifeng
    Liu, Ting
    Li, Sheng
    [J]. NATURAL LANGUAGE ENGINEERING, 2009, 15 : 503 - 526
  • [8] Mining Frequent Patterns for Scalable and Accurate Malware Detection System in Android
    Thi-Tra-My Nguyen
    Dong-Son Nguyen
    Van Tong
    Duc Tran
    Hai-Anh Tran
    Mellouk, Abdelhamid
    [J]. 2018 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS (PIMRC), 2018, : 370 - 375
  • [9] Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs
    Poezevara, Guillaume
    Cuissart, Bertrand
    Cremilleux, Bruno
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2011, 37 (03) : 333 - 353
  • [10] Extracting and summarizing the frequent emerging graph patterns from a dataset of graphs
    Guillaume Poezevara
    Bertrand Cuissart
    Bruno Crémilleux
    [J]. Journal of Intelligent Information Systems, 2011, 37 : 333 - 353