An efficient approach for mining sequential patterns using multiple threads on very large databases

被引:23
|
作者
Bao Huynh [1 ,2 ]
Cuong Trinh [3 ,4 ]
Huy Huynh [3 ,4 ]
Thien-Trang Van [5 ]
Bay Vo [5 ,6 ]
Snasel, Vaclav [4 ]
机构
[1] Ton Duc Thang Univ, Ctr Appl Informat Technol, Ho Chi Minh City, Vietnam
[2] Ton Duc Thang Univ, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Ton Duc Thang Univ, Dept Comp & Comp Serv, Ho Chi Minh City, Vietnam
[4] VSB Tech Univ Ostrava, Fac Elect Engn & Comp Sci, Ostrava, Poruba, Czech Republic
[5] Ho Chi Minh City Univ Technol HUTECH, Fac Informat Technol, Ho Chi Minh City, Vietnam
[6] Sejong Univ, Coll Elect & Informat Engn, Seoul, South Korea
关键词
Sequential patterns; Multi-core processors; Multi-threading; Early pruning; SEQUENCES;
D O I
10.1016/j.engappai.2018.06.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sequential pattern mining (SPM) plays an important role in data mining, with broad applications such as in financial markets, education, medicine, and prediction. Although there are many efficient algorithms for SPM, the mining time is still high, especially for mining sequential patterns from huge databases, which require the use of a parallel technique. In this paper, we propose a parallel approach named MCM-SPADE (Multiple threads CM-SPADE), for use on a multi-core processor system as a :multi-threading technique for SPM with very large database, to enhance the performance of the previous methods SPADE and CM-SPADE. The proposed algorithm uses the vertical data format and a data structure named CMAP (Co-occurrence MAP) for storing co-occurrence information. Based on the data structure CMAP, the proposed algorithm performs early pruning of the candidates to reduce the search space and it partitions the related tasks to each processor core by using the divide-and-conquer property. The proposed algorithm also uses dynamic scheduling to avoid task idling and achieve load balancing between processor cores. The experimental results show that MCM-SPADE attains good parallelization efficiency on various input databases.
引用
收藏
页码:242 / 251
页数:10
相关论文
共 50 条
  • [1] An Efficient Approach to Discovering Sequential Patterns in Large Databases
    Yen, Show-Jane
    Cho, Chung-Wen
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 685 - 690
  • [2] An Efficient Approach for Mining Weighted Sequential Patterns in Dynamic Databases
    Ishita, Sabrina Zaman
    Noor, Faria
    Ahmed, Chowdhury Farhan
    [J]. ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 215 - 229
  • [3] An Efficient Algorithm for Mining Maximal Frequent Sequential Patterns in Large Databases
    Su, Qiu-bin
    Lu, Lu
    Cheng, Bin
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, : 404 - 410
  • [4] Incremental mining of sequential patterns in large databases
    Masseglia, F
    Poncelet, P
    Teisseire, M
    [J]. DATA & KNOWLEDGE ENGINEERING, 2003, 46 (01) : 97 - 121
  • [5] Mining Rare Sequential Patterns in Large Transaction Databases
    Ouyang, Weimin
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ELECTRONIC TECHNOLOGY, 2016, 48 : 159 - 162
  • [6] Mining Weighted a Closed Sequential Patterns in Large Databases
    Ren, Jia-Dong
    Yang, Jing
    Li, Yan
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 5, PROCEEDINGS, 2008, : 640 - 644
  • [7] Mining sequential patterns across multiple sequence databases
    Peng, Wen-Chih
    Liao, Zhung-Xun
    [J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (10) : 1014 - 1033
  • [8] Mining Integrated Sequential Patterns From Multiple Databases
    Ezeife, Christie, I
    Aravindan, Vignesh
    Chaturvedi, Ritu
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2020, 16 (01) : 1 - 21
  • [9] Mining integrated sequential patterns from multiple databases
    Ezeife, Christie I.
    Aravindan, Vignesh
    Chaturvedi, Ritu
    [J]. International Journal of Data Warehousing and Mining, 2020, 16 (01): : 1 - 21
  • [10] Mining Positive and Negative Fuzzy Multiple Level Sequential Patterns in Large Transaction Databases
    Ouyang, Weimin
    Huang, Qinhua
    [J]. PROCEEDINGS OF THE 2009 WRI GLOBAL CONGRESS ON INTELLIGENT SYSTEMS, VOL I, 2009, : 500 - 504