A weighted common structure based clustering technique for XML documents

被引:11
|
作者
Hwang, Jeong Hee [2 ]
Ryu, Keun Ho [1 ]
机构
[1] Chungbuk Natl Univ, Sch Elect & Comp Engn, Database Bioinformat Lab, Cheongju 361763, Chungbuk, South Korea
[2] Namseoul Univ, Dept Comp Sci, Cheonan 331707, Chungnam, South Korea
关键词
Data mining; XML mining; Document clustering; XML clustering;
D O I
10.1016/j.jss.2010.02.004
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
XML has recently become very popular as a means of representing semistructured data and as a standard for data exchange over the Web, because of its varied applicability in numerous applications. Therefore, XML documents constitute an important data mining domain. In this paper, we propose a new method of XML document clustering by a global criterion function, considering the weight of common structures. Our approach initially extracts representative structures of frequent patterns from schemaless XML documents using a sequential pattern mining algorithm. Then, we perform clustering of an XML document by the weight of common structures, without a measure of pairwise similarity, assuming that an XML document is a transaction and frequent structures extracted from documents are items of the transaction. We conducted experiments to compare our method with previous methods. The experimental results show the effectiveness of our approach. Crown Copyright (C) 2010 Published by Elsevier Inc. All rights reserved.
引用
收藏
页码:1267 / 1274
页数:8
相关论文
共 50 条
  • [1] Clustering XML documents by structure based on common neighbor
    Zhang, XZ
    Lv, TY
    Wang, ZX
    Zuo, WL
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 771 - 776
  • [2] All common embedded subtrees for clustering XML documents by structure
    Lin, Zhiwei
    Wang, Hui
    McClean, Sally
    Wang, Haiying
    [J]. PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 13 - 18
  • [3] Structural- Based clustering technique of XML documents
    [J]. 1600, IEEE Computer Society
  • [4] Structural-based Clustering Technique OF XML Documents
    Posonia, Mary A.
    Jyothi, V. L.
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2013), 2013, : 1239 - 1242
  • [5] Clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 112 - 121
  • [6] Clustering XML Documents by Structure
    Lesniewska, Anna
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 5968 : 238 - 246
  • [7] Clustering of XML Documents Based on Structure and Aggregated Content
    Rezk, Nermeen Gamal
    Sarhan, Amany
    Algergawy, Alsaved
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 93 - 102
  • [8] Clustering Homogeneous XML Documents Using Weighted Similarities on XML Attributes
    Nagwani, Naresh Kumar
    Bhansali, Ashok
    [J]. 2010 IEEE 2ND INTERNATIONAL ADVANCE COMPUTING CONFERENCE, 2010, : 369 - +
  • [9] A methodology for clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. INFORMATION SYSTEMS, 2006, 31 (03) : 187 - 228
  • [10] Clustering and retrieval of XML documents by structure
    Hwang, JH
    Ryu, KH
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2005, PT 2, 2005, 3481 : 925 - 935