CLS and CLS Close: The Scalable Method for Mining the Semi Structured Data Set

被引:0
|
作者
Gaol, Ford Lumban [1 ]
Widjaja, Belawati H. [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Bogor, Indonesia
关键词
frequent pattern; closed pattern; graph mining; CLS code; canonical label;
D O I
10.1007/978-1-4020-8735-6_35
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semistructured pattern can be formally modeled as Graph Pattern. The most important problem to be solved in mining large semi structured dataset is the scalability of the method. With the successful development of efficient and scalable algorithms for mining frequent itemsets and sequences, it is natural to extend the scope of study to a more general pattern mining problem: mining frequent semistructured patterns or graph patterns. In this paper, we extend the methodology of pattern-growth and develop a novel algorithm called CLS (Canonical Labeling System), which discovers frequent connected subgraphs efficiently using either depth-first search or breadth-first search strategy. A novel canonical labeling system and search order are devised to support efficient pattern growth. CLS has advantages of simplicity and efficiency over other methods since it combines pattern growing and pattern checking into one procedure. Based on CLS, we develop CLS Close to mine closed frequent graphs, which not only eliminates redundant patterns but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
引用
收藏
页码:186 / +
页数:2
相关论文
共 50 条
  • [1] Survey on Mining in Semi-Structured Data
    Shettar, Rajashree
    Shobha, G.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (08): : 226 - 231
  • [2] Semi interactive method for data mining
    Boudjeloud-Assala, Lydia
    Poulet, Francois
    [J]. ICEIS 2006: PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2006, : 3 - 10
  • [3] Application of Profile Likelihood Ratio and CLs Method in Data Analysis of PandaX-II Dark Matter Experiment
    Abdukerim, Abdusalam
    Chen, Xun
    Iminniyaz, Hoernisa
    [J]. Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2019, 53 (12): : 1508 - 1514
  • [4] Research on the Data Model and the Approaches to Data Mining in the Semi-structured Data
    Liu, Fenghua
    [J]. APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 663 - 666
  • [5] Scalable Approach for Mining Association Rules from Structured XML Data
    Abazeed, Ashraf
    Mamat, Ali
    Sulaiman, Md Nasir
    Ibrahim, Hamidah
    [J]. 2009 2ND CONFERENCE ON DATA MINING AND OPTIMIZATION, 2009, : 5 - 9
  • [6] Online algorithms for mining semi-structured data stream
    Asai, T
    Arimura, H
    Abe, K
    Kawasoe, S
    Arikawa, S
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 27 - 34
  • [7] Semi-structured data extraction and schema knowledge mining
    Chen, E.
    Wang, X.
    [J]. High Technology Letters, 2001, 7 (01) : 1 - 5
  • [8] Semi-structured Data Extraction and Schema Knowledge Mining
    陈恩红
    [J]. High Technology Letters, 2001, (01) : 1 - 5
  • [9] Data Analysis for Gathering and Analysis of no structured and semi-structured information using Data Mining Technique
    Papavlasopoulos, Sozon
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 293 - +
  • [10] Data Distribution Method for Scalable Actionable Pattern Mining
    Bagavathi, Arunkumar
    Rao, Varun
    Tzacheva, Angelina A.
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,