Clustering XML documents by structure based on common neighbor

被引:0
|
作者
Zhang, XZ [1 ]
Lv, TY
Wang, ZX
Zuo, WL
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Changchun 130023, Peoples R China
[2] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Peoples R China
关键词
XML structure; clustering; common neighbor;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is important to perform the clustering task on XML documents. However, it is difficult to select the appropriate parameters' value for the clustering algorithms. Meanwhile, current clustering algorithms lack the effective mechanism to detect outliers while treating outliers as "noise". By integrating outlier detection with clustering, the paper takes a new approach for analyzing the XML documents by structure. After stating the concept of common neighbor based outlier, the paper proposes a new clustering algorithm, which stops clustering automatically by utilizing the outlier information and needs only one parameter, whose appropriate value range is decided in the outlier mining process. After discussing some features of the proposed algorithm, the paper adopts the XML dataset with different structure and other real-life datasets to compare it with other clustering algorithms.
引用
收藏
页码:771 / 776
页数:6
相关论文
共 50 条
  • [41] Classifying XML documents based on Structure/Content similarity
    Xing, Guangming
    Guo, Jinhua
    Xia, Zhonghang
    [J]. COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 444 - 457
  • [42] Overview of the INEX 2008 XML Mining Track Categorization and Clustering of XML Documents in a Graph of Documents
    Denoyer, Ludovic
    Gallinari, Patrick
    [J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 401 - 411
  • [43] Clustering XML Documents Using Frequent Subtrees
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    [J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
  • [44] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [45] Clustering XML documents using structural summaries
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 547 - 556
  • [46] Structure Based XML Document Clustering: A Review
    Thulasi, A.
    Remya, K. T. V.
    Raju, G.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INFOCOM TECHNOLOGIES AND UNMANNED SYSTEMS (TRENDS AND FUTURE DIRECTIONS) (ICTUS), 2017, : 543 - 547
  • [47] Similarity measurement of XML documents based on structure and contents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    Kim, Deok-Hwan
    [J]. COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 902 - +
  • [48] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    [J]. Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [49] Semantic Structural Similarity for Clustering XML Documents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    [J]. ICHIT 2008: INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 552 - 557
  • [50] A Framework for Clustering and Dynamic Maintenance of XML Documents
    Al-Shammari, Ahmed
    Liu, Chengfei
    Naseriparsa, Mehdi
    Bao Quoc Vo
    Anwar, Tarique
    Zhou, Rui
    [J]. ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 399 - 412