CloFAST: closed sequential pattern mining using sparse and vertical id-lists

被引:0
|
作者
Fabio Fumarola
Pasqua Fabiana Lanotte
Michelangelo Ceci
Donato Malerba
机构
[1] University of Bari “A. Moro”,Department of Computer Science
来源
Knowledge and Information Systems | 2016年 / 48卷
关键词
Sequential pattern mining; Closed sequences; Data mining; Itemset;
D O I
暂无
中图分类号
学科分类号
摘要
Sequential pattern mining is a computationally challenging task since algorithms have to generate and/or test a combinatorially explosive number of intermediate subsequences. In order to reduce complexity, some researchers focus on the task of mining closed sequential patterns. This not only results in increased efficiency, but also provides a way to compact results, while preserving the same expressive power of patterns extracted by means of traditional (non-closed) sequential pattern mining algorithms. In this paper, we present CloFAST, a novel algorithm for mining closed frequent sequences of itemsets. It combines a new data representation of the dataset, based on sparse id-lists and vertical id-lists, whose theoretical properties are studied in order to fast count the support of sequential patterns, with a novel one-step technique both to check sequence closure and to prune the search space. Contrary to almost all the existing algorithms, which iteratively alternate itemset extension and sequence extension, CloFAST proceeds in two steps. Initially, all closed frequent itemsets are mined in order to obtain an initial set of sequences of size 1. Then, new sequences are generated by directly working on the sequences, without mining additional frequent itemsets. A thorough performance study with both real-world and artificially generated datasets empirically proves that CloFAST outperforms the state-of-the-art algorithms, both in time and memory consumption, especially when mining long closed sequences.
引用
收藏
页码:429 / 463
页数:34
相关论文
共 50 条
  • [21] A Review on Sequential Pattern Mining using Pattern Growth Approach
    Patel, Roshani
    Chaudhari, Tarunika
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 1424 - 1427
  • [22] Mining Closed Sequential Patterns Using Genetic Algorithm
    Raju, V. Purushothama
    Varma, G. P. Saradhi
    2014 INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES (ICACCCT), 2014, : 634 - 637
  • [23] Transportation Service Quality Improvement through Closed Sequential Pattern Mining Approach
    Huang, Haisong
    Yao, Liguo
    Tsai, Chieh-Yuan
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2016, 16 (03) : 185 - 194
  • [24] Multi-queries oriented closed sequential pattern mining over stream
    Li, Haifeng
    Zhang, Ning
    Hai, Mo
    Chai, Yanmei
    Journal of Information and Computational Science, 2010, 7 (02): : 301 - 307
  • [25] Extracting feature sequences in software vulnerabilities based on closed sequential pattern mining
    Wu, Qunhui
    Ma, Shilong
    Wang, Hao
    Journal of Software, 2013, 8 (08) : 1809 - 1817
  • [26] Sequential Pattern Mining Using Formal language Tools
    Joshi, Sunil
    Jadon, R.S.
    Jain, R.C.
    International Journal of Computer Science Issues, 2012, 9 (5 5-2): : 316 - 325
  • [27] Using Sequential Pattern Mining to Analyze the Behavior on the WELS
    Wang, Yi-Lin
    Wen, Ling-Yu Melody
    Chen, Tung-Shou
    Chen, Rong-Chang
    INFORMATION AND BUSINESS INTELLIGENCE, PT I, 2012, 267 : 95 - +
  • [28] Biosequence classification using sequential pattern mining and optimization
    Fotiadis, D. I.
    Exarchos, T. P.
    Tsipouras, A. G.
    Papaloukas, C.
    2007 6TH INTERNATIONAL SPECIAL TOPIC CONFERENCE ON INFORMATION TECHNOLOGY APPLICATIONS IN BIOMEDICINE, 2007, : 289 - +
  • [29] Scalable and parallel sequential pattern mining using spark
    Xiao Yu
    Qing Li
    Jin Liu
    World Wide Web, 2019, 22 : 295 - 324
  • [30] Scalable and parallel sequential pattern mining using spark
    Yu, Xiao
    Li, Qing
    Liu, Jin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (01): : 295 - 324