A tree-based approach to clustering XML documents by structure

被引:0
|
作者
Costa, G
Manco, G
Ortale, R
Tagarelli, A
机构
[1] Inst Italian Natl Res Council, CNR, ICAR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, DEIS, I-87036 Arcavacata Di Rende, CS, Italy
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach.
引用
收藏
页码:137 / 148
页数:12
相关论文
共 50 条
  • [41] Tree-based text chat using XML-based messages
    Kim, K
    [J]. IC'04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS 1 AND 2, 2004, : 669 - 675
  • [42] Fast Tree-Based Classification via Homogeneous Clustering
    Pardis, George
    Diamantaras, Konstantinos I.
    Ougiaroglou, Stefanos
    Evangelidis, Georgios
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2019, PT I, 2019, 11871 : 514 - 524
  • [43] Tree-Based Algorithm for Stable and Efficient Data Clustering
    Aljabbouli, Hasan
    Albizri, Abdullah
    Harfouche, Antoine
    [J]. INFORMATICS-BASEL, 2020, 7 (04):
  • [44] The unreasonable effectiveness of tree-based theory for networks with clustering
    Melnik, Sergey
    Hackett, Adam
    Porter, Mason A.
    Mucha, Peter J.
    Gleeson, James P.
    [J]. PHYSICAL REVIEW E, 2011, 83 (03)
  • [45] treeClust: An R Package for Tree-Based Clustering Dissimilarities
    Buttrey, Samuel E.
    Whitaker, Lyn R.
    [J]. R JOURNAL, 2015, 7 (02): : 227 - 236
  • [46] XML Documents Clustering Algorithm Based on Cluster Core And LSPX
    Zhao, Di
    Fu, HaiDong
    Ren, Hui
    Wei, Mengxue
    Chu, Jie
    [J]. PROCEEDINGS OF THE 2017 12TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2017, : 1027 - 1032
  • [47] Algorithms for Clustering XML Documents: A Review
    Gulati, Shagun
    Munjal, Geetika
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 654 - 658
  • [48] A robust clustering method for XML documents
    Zhao, Bin
    Zhang, Yong-Sheng
    Zhang, Hua-Xiang
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 1, 2008, : 19 - 23
  • [49] Clustering large scale of XML documents
    Wang, Tong
    Liu, Da-Xin
    Lin, Xuan-Zuo
    Sun, Wei
    Ahmad, Gufran
    [J]. ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2006, 3947 : 447 - 455
  • [50] Mining Intensional Information for answering XML-Queries using Tree-based Association Rules Approach
    Mahalakshmi, S. Devi
    Vijayalakshmi, K.
    Muneeswaran, K.
    Priyanka, G.
    [J]. 2013 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2013, : 168 - 174