XML clustering by principal component analysis

被引:0
|
作者
Liu, JH [1 ]
Wang, JTL [1 ]
Hsu, W [1 ]
Herbert, KG [1 ]
机构
[1] New Jersey Inst Technol, Coll Comp Sci, Newark, NJ 07102 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques.
引用
收藏
页码:658 / 662
页数:5
相关论文
共 50 条
  • [1] XML clustering and retrieval through principal component analysis
    Wang, JTL
    Liu, JH
    Wang, JH
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2005, 14 (04) : 683 - 699
  • [2] Clustering and disjoint principal component analysis
    Vichi, Maurizio
    Saporta, Gilbert
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (08) : 3194 - 3208
  • [3] Principal component analysis and clustering on manifolds
    V. Mardia, Kanti
    Wiechers, Henrik
    Eltzner, Benjamin
    Huckemann, Stephan F.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 188
  • [4] XML document clustering by independent component analysis
    Wang, Tong
    Liu, Da-Xin
    Lin, Xuan-Zuo
    KNOWLEDGE DISCOVERY FROM XML DOCUMENTS, PROCEEDINGS, 2006, 3915 : 13 - 21
  • [5] Principal component analysis for clustering temporomandibular joint data
    Meng Shuaishuai
    Fu Yuzhuo
    Liu Ting
    Li Yi
    2015 8TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 1, 2015, : 422 - 425
  • [6] Principal Component Analysis based Feature Selection for clustering
    Xu, Jun-Ling
    Xu, Bao-Wen
    Zhang, Wei-Feng
    Cui, Zi-Feng
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 460 - +
  • [7] A Quantum Principal Component Analysis Algorithm for Clustering Problems
    Liu W.
    Wang B.
    Chen J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2022, 59 (12): : 2858 - 2866
  • [8] Principal Component Analysis and Clustering Based Indoor Localization
    Liang, Dong
    Yang, Jingkang
    Xuan, Rui
    Zhang, Zhaojing
    Yang, Zhifang
    Shi, Kexin
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1103 - 1108
  • [9] Principal component analysis for clustering gene expression data
    Yeung, KY
    Ruzzo, WL
    BIOINFORMATICS, 2001, 17 (09) : 763 - 774
  • [10] Effect of dimension reduction by principal component analysis on clustering
    Erisoglu, Murat
    Erisoglu, Ulku
    JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2011, 14 (02) : 277 - 287