A tree-based approach to clustering XML documents by structure

被引:0
|
作者
Costa, G
Manco, G
Ortale, R
Tagarelli, A
机构
[1] Inst Italian Natl Res Council, CNR, ICAR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.
引用
收藏
页码:137 / 148
页数:12
相关论文
共 50 条
  • [1] A Huffman Tree-Based Algorithm for Clustering Documents
    Liu, Yaqiong
    Wen, Yuzhuo
    Yuan, Dingrong
    Cuan, Yuwei
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014, 2014, 8933 : 630 - 640
  • [2] Clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 112 - 121
  • [3] Clustering XML Documents by Structure
    Lesniewska, Anna
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 5968 : 238 - 246
  • [4] Clustering of XML Documents Based on Structure and Aggregated Content
    Rezk, Nermeen Gamal
    Sarhan, Amany
    Algergawy, Alsaved
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 93 - 102
  • [5] Clustering XML documents by structure based on common neighbor
    Zhang, XZ
    Lv, TY
    Wang, ZX
    Zuo, WL
    [J]. COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, 2005, 3801 : 771 - 776
  • [6] A clustering approach for XML linked documents
    Catania, B
    Maddalena, A
    [J]. 13TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2002, : 121 - 125
  • [7] A methodology for clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. INFORMATION SYSTEMS, 2006, 31 (03) : 187 - 228
  • [8] Clustering and retrieval of XML documents by structure
    Hwang, JH
    Ryu, KH
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2005, PT 2, 2005, 3481 : 925 - 935
  • [9] XCLSC: Structure and Content-based Clustering of XML Documents
    Bessine, Karima
    Nehar, Attia
    Cherroun, Hadda
    Moussaoui, Abdelouahab
    [J]. 2015 12TH IEEE INTERNATIONAL CONFERENCE ON PROGRAMMING AND SYSTEMS (ISPS), 2015, : 221 - 227
  • [10] A weighted common structure based clustering technique for XML documents
    Hwang, Jeong Hee
    Ryu, Keun Ho
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) : 1267 - 1274