Memory-adaptive high utility sequential pattern mining over data streams

被引:27
|
作者
Zihayat, Morteza [1 ]
Chen, Yan [1 ]
An, Aijun [1 ]
机构
[1] York Univ, Dept Comp Sci & Engn, 4700 Keele St, Toronto, ON M3J 1P3, Canada
关键词
High utility sequential pattern mining; Data streams; Approximation algorithms; EFFICIENT ALGORITHM; ITEMSETS;
D O I
10.1007/s10994-016-5617-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High utility sequential pattern (HUSP) mining has emerged as an important topic in data mining. A number of studies have been conducted on mining HUSPs, but they are mainly intended for non-streaming data and thus do not take data stream characteristics into consideration. Streaming data are fast changing, continuously generated unbounded in quantity. Such data can easily exhaust computer resources (e.g., memory) unless a proper resource-aware mining is performed. In this study, we explore the fundamental problem of how limited memory can be best utilized to produce high quality HUSPs over a data stream. We design an approximation algorithm, called MAHUSP, that employs memory adaptive mechanisms to use a bounded portion of memory, in order to efficiently discover HUSPs over data streams. An efficient tree structure, called MAS-Tree, is proposed to store potential HUSPs over a data stream. MAHUSP guarantees that all HUSPs are discovered in certain circumstances. Our experimental study shows that our algorithm can not only discover HUSPs over data streams efficiently, but also adapt to memory allocation with limited sacrifices in the quality of discovered HUSPs. Furthermore, in order to show the effectiveness and efficiency of MAHUSP in real-life applications, we apply our proposed algorithm to a web clickstream dataset obtained from a Canadian news portal to showcase users' reading behavior, and to a real biosequence database to identify disease-related gene regulation sequential patterns. The results show that MAHUSP effectively discovers useful and meaningful patterns in both cases.
引用
收藏
页码:799 / 836
页数:38
相关论文
共 50 条
  • [1] Memory-adaptive high utility sequential pattern mining over data streams
    Morteza Zihayat
    Yan Chen
    Aijun An
    [J]. Machine Learning, 2017, 106 : 799 - 836
  • [2] High utility pattern mining over data streams with sliding window technique
    Ryang, Heungmo
    Yun, Unil
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 57 : 214 - 231
  • [3] Damped window based high average utility pattern mining over data streams
    Yun, Unil
    Kim, Donggyu
    Yoon, Eunchul
    Fujita, Hamido
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 144 : 188 - 205
  • [4] High utility pattern mining algorithm over data streams using ext-list
    Han, Meng
    Li, Muhang
    Chen, Zhiqiang
    Wu, Hongxin
    Zhang, Xilong
    [J]. APPLIED INTELLIGENCE, 2023, 53 (22) : 27072 - 27095
  • [5] High utility pattern mining algorithm over data streams using ext-list.
    Meng Han
    Muhang Li
    Zhiqiang Chen
    Hongxin Wu
    Xilong Zhang
    [J]. Applied Intelligence, 2023, 53 : 27072 - 27095
  • [6] Interactive mining of high utility patterns over data streams
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Choi, Ho-Jin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (15) : 11979 - 11991
  • [7] Automatic Sequential Pattern Mining in Data Streams
    Kawabata, Koki
    Matsubara, Yasuko
    Sakurai, Yasushi
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 1733 - 1742
  • [8] A New Adaptive Algorithm for Frequent Pattern Mining over Data Streams
    Deypir, Mahmood
    Sadreddini, Mohammad Hadi
    [J]. 2011 1ST INTERNATIONAL ECONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2011, : 230 - 235
  • [9] Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams
    Li, Hua-Fu
    Huang, Hsin-Yun
    Chen, Yi-Cheng
    Liu, Yu-Jiun
    Lee, Suh-Yin
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 881 - +
  • [10] A New Algorithm of Mining High Utility Sequential Pattern in Streaming Data
    Tang, Huijun
    Liu, Yangguang
    Wang, Le
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2019, 12 (01) : 342 - 350