CloFAST: closed sequential pattern mining using sparse and vertical id-lists

被引:0
|
作者
Fabio Fumarola
Pasqua Fabiana Lanotte
Michelangelo Ceci
Donato Malerba
机构
[1] University of Bari “A. Moro”,Department of Computer Science
来源
关键词
Sequential pattern mining; Closed sequences; Data mining; Itemset;
D O I
暂无
中图分类号
学科分类号
摘要
Sequential pattern mining is a computationally challenging task since algorithms have to generate and/or test a combinatorially explosive number of intermediate subsequences. In order to reduce complexity, some researchers focus on the task of mining closed sequential patterns. This not only results in increased efficiency, but also provides a way to compact results, while preserving the same expressive power of patterns extracted by means of traditional (non-closed) sequential pattern mining algorithms. In this paper, we present CloFAST, a novel algorithm for mining closed frequent sequences of itemsets. It combines a new data representation of the dataset, based on sparse id-lists and vertical id-lists, whose theoretical properties are studied in order to fast count the support of sequential patterns, with a novel one-step technique both to check sequence closure and to prune the search space. Contrary to almost all the existing algorithms, which iteratively alternate itemset extension and sequence extension, CloFAST proceeds in two steps. Initially, all closed frequent itemsets are mined in order to obtain an initial set of sequences of size 1. Then, new sequences are generated by directly working on the sequences, without mining additional frequent itemsets. A thorough performance study with both real-world and artificially generated datasets empirically proves that CloFAST outperforms the state-of-the-art algorithms, both in time and memory consumption, especially when mining long closed sequences.
引用
收藏
页码:429 / 463
页数:34
相关论文
共 50 条
  • [1] CloFAST: closed sequential pattern mining using sparse and vertical id-lists
    Fumarola, Fabio
    Lanotte, Pasqua Fabiana
    Ceci, Michelangelo
    Malerba, Donato
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 48 (02) : 429 - 463
  • [2] FAST Sequence Mining Based on Sparse Id-Lists
    Salvemini, Eliana
    Fumarola, Fabio
    Malerba, Donato
    Han, Jiawei
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 316 - 325
  • [3] Closed multidimensional sequential pattern mining
    Songram, Panida
    Boonjing, Veera
    Intakosum, Sarun
    THIRD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, PROCEEDINGS, 2006, : 512 - +
  • [4] A Survey on Closed Sequential Pattern Mining
    Raju, V. Purushothama
    Varma, G. P. Saradhi
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [5] CLOSED SEQUENTIAL PATTERN MINING IN BIOLOGICAL DATA
    Jawahar, S.
    Harishchander, A.
    Devaraju, S.
    Ali, S. Ahamed Johnsha
    Manivasagan, C.
    Sumathi, P.
    INTERNATIONAL JOURNAL OF LIFE SCIENCE AND PHARMA RESEARCH, 2020, : 9 - 13
  • [6] Closed sequential pattern mining for sitemap generation
    Michelangelo Ceci
    Pasqua Fabiana Lanotte
    World Wide Web, 2021, 24 : 175 - 203
  • [7] NetNCSP: Nonoverlapping closed sequential pattern mining
    Wu, Youxi
    Zhu, Changrui
    Li, Yan
    Guo, Lei
    Wu, Xindong
    KNOWLEDGE-BASED SYSTEMS, 2020, 196 (196)
  • [8] Closed sequential pattern mining for sitemap generation
    Ceci, Michelangelo
    Lanotte, Pasqua Fabiana
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2021, 24 (01): : 175 - 203
  • [9] Closed sequential pattern mining in high dimensional sequences
    Han, Meng
    Wang, Zhihai
    Yuan, Jidong
    Journal of Software, 2013, 8 (06) : 1368 - 1373
  • [10] A closed sequential pattern mining algorithm in time order
    Fu, Yu
    Yu, Yan-Hua
    Song, Mei-Na
    Zhan, Xiao-Su
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2013, 36 (04): : 19 - 22