A weighted common structure based clustering technique for XML documents

被引:11
|
作者
Hwang, Jeong Hee [2 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju 361763, Chungbuk, South Korea
[2] Namseoul Univ, Dept Comp Sci, Cheonan 331707, Chungnam, South Korea
关键词
Data mining; XML mining; Document clustering; XML clustering;
D O I
10.1016/j.jss.2010.02.004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach. Crown Copyright (C) 2010 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:1267 / 1274
页数:8
相关论文
共 50 条
  • [31] Clustering XML documents by patterns
    Maciej Piernik
    Dariusz Brzezinski
    Tadeusz Morzy
    [J]. Knowledge and Information Systems, 2016, 46 : 185 - 212
  • [32] An Efficient Association Rule Based Clustering of XML Documents
    Muralidhar, A.
    Pattabiraman, V.
    [J]. BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 : 401 - 407
  • [33] Clustering Algorithm Based on Semantic Distance for XML Documents
    Yang, Lingxian
    Gu, Jinguang
    Chen, Heping
    [J]. FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 549 - +
  • [34] Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX
    Magdaleno, Damny
    Fuentes, Vett E.
    Garcia, Maria M.
    [J]. COMPUTACION Y SISTEMAS, 2015, 19 (01): : 151 - 161
  • [35] XML Documents Clustering Algorithm Based on Cluster Core And LSPX
    Zhao, Di
    Fu, HaiDong
    Ren, Hui
    Wei, Mengxue
    Chu, Jie
    [J]. PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1027 - 1032
  • [36] A clustering approach for XML linked documents
    Catania, B
    Maddalena, A
    [J]. 13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 121 - 125
  • [37] Algorithms for Clustering XML Documents: A Review
    Gulati, Shagun
    Munjal, Geetika
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 654 - 658
  • [38] A robust clustering method for XML documents
    Zhao, Bin
    Zhang, Yong-Sheng
    Zhang, Hua-Xiang
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 1, 2008, : 19 - 23
  • [39] Clustering large scale of XML documents
    Wang, Tong
    Liu, Da-Xin
    Lin, Xuan-Zuo
    Sun, Wei
    Ahmad, Gufran
    [J]. ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2006, 3947 : 447 - 455
  • [40] Similarity Evaluation of XML Documents Based on Weighted Element Tree Model
    Wang, Chenying
    Yuan, Xiaojie
    Ning, Hua
    Lian, Xin
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 680 - 687