Clustering of XML Documents Based on Structure and Aggregated Content

被引:0
|
作者
Rezk, Nermeen Gamal [1 ]
Sarhan, Amany [2 ]
Algergawy, Alsaved [3 ]
机构
[1] Kafr Elshiekh Univ, Elect Engn Dept, Fac Engn, Kafr Al Sheikh, Egypt
[2] Tanta Univ, Comp & Control Engn Dept, Fac Engn, Tanta, Egypt
[3] Friedrich Schiller Univ Jena, Inst Comp Sci, Jena, Germany
关键词
XML; aggregated content; structure similarity; content similarity; clustering;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The main objective of the work is to improve the clustering efficiency and performance when we deal with very big datasets. This paper aims to improve the quality of XML data clustering by exploiting more features extracted from source schemas. In particular, it proposes clustering approach that gathers both content and structure of XML documents to determine similarity between them. The content and structure information are concluded using two different similarity methods that are then grouped via weight factor to compute the overall document similarity. The structural similarity of XML data are derived from edge summaries while content features similarity are derived from aggregate of set of similarity measures; Jaccard, Cosine measure and Jensen-Shannon divergence in one algorithm. However, we also experimented using Jaccard distance as content measure with edge summaries to prove that using an aggregation of content similarity measures can further improve the results. The experiments prove that clustering of XML documents based on structure only information produce worse solution in homogenous environment, while in heterogeneous environment clustering of XML document produce better result when the structure and the content are combined. Results have shown that performance and quality of the proposed approach is better in comparison of both XEdge and XCLSC approaches.
引用
收藏
页码:93 / 102
页数:10
相关论文
共 50 条
  • [1] XCLSC: Structure and Content-based Clustering of XML Documents
    Bessine, Karima
    Nehar, Attia
    Cherroun, Hadda
    Moussaoui, Abdelouahab
    [J]. 2015 12TH IEEE INTERNATIONAL CONFERENCE ON PROGRAMMING AND SYSTEMS (ISPS), 2015, : 221 - 227
  • [2] Structure and Content Similarity for Clustering XML Documents
    Zhang, Lijun
    Li, Zhanhuai
    Chen, Qun
    Li, Ning
    [J]. WEB-AGE INFORMATION MANAGEMENT, 2010, 6185 : 116 - 124
  • [3] Clustering XML Documents by Combining Content and Structure
    Guo Yongming
    Chen Dehua
    Le Jiajin
    [J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 1, 2008, : 583 - 587
  • [4] FXProj - A Fuzzy XML Documents Projected Clustering Based on Structure and Content
    Ji, Tengfei
    Bao, Xiaoyuan
    Yang, Dongqing
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 406 - 419
  • [5] Clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 112 - 121
  • [6] Clustering XML Documents by Structure
    Lesniewska, Anna
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 5968 : 238 - 246
  • [7] Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX
    Magdaleno, Damny
    Fuentes, Vett E.
    Garcia, Maria M.
    [J]. COMPUTACION Y SISTEMAS, 2015, 19 (01): : 151 - 161
  • [8] Clustering XML documents by structure based on common neighbor
    Zhang, XZ
    Lv, TY
    Wang, ZX
    Zuo, WL
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 771 - 776
  • [9] A methodology for clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. INFORMATION SYSTEMS, 2006, 31 (03) : 187 - 228
  • [10] Clustering and retrieval of XML documents by structure
    Hwang, JH
    Ryu, KH
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2005, PT 2, 2005, 3481 : 925 - 935