XML clustering by principal component analysis

被引:0
|
作者
Liu, JH [1 ]
Wang, JTL [1 ]
Hsu, W [1 ]
Herbert, KG [1 ]
机构
[1] New Jersey Inst Technol, Coll Comp Sci, Newark, NJ 07102 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques.
引用
收藏
页码:658 / 662
页数:5
相关论文
共 50 条
  • [31] Fault detection of flywheel system based on clustering and principal component analysis
    Wang Rixin
    Gong Xuebing
    Xu Minqiang
    Li Yuqing
    Chinese Journal of Aeronautics, 2015, 28 (06) : 1676 - 1688
  • [32] Fault detection of flywheel system based on clustering and principal component analysis
    Wang Rixin
    Gong Xuebing
    Xu Minqiang
    Li Yuqing
    CHINESE JOURNAL OF AERONAUTICS, 2015, 28 (06) : 1676 - 1688
  • [33] Simultaneous approach to Principal Component Analysis and fuzzy clustering with missing values
    Honda, K
    Sugiura, N
    Ichihashi, H
    JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 1810 - 1815
  • [34] Parameter clustering in Bayesian functional principal component analysis of neuroscientific data
    Margaritella, Nicolo
    Inacio, Vanda
    King, Ruth
    STATISTICS IN MEDICINE, 2021, 40 (01) : 167 - 184
  • [35] An assessment of climatological synoptic typing by principal component analysis and kmeans clustering
    Charles Cuell
    Barrie Bonsal
    Theoretical and Applied Climatology, 2009, 98 : 361 - 373
  • [36] Principal component and clustering analysis of functional traits in Swiss dairy cattle
    Karacaoeren, Burak
    Kadarmideen, Haja N.
    TURKISH JOURNAL OF VETERINARY & ANIMAL SCIENCES, 2008, 32 (03): : 163 - 171
  • [37] Improving Hierarchical Clustering of Genotypic Data via Principal Component Analysis
    Odong, T. L.
    van Heerwaarden, J.
    van Hintum, T. J. L.
    van Eeuwijk, F. A.
    Jansen, J.
    CROP SCIENCE, 2013, 53 (04) : 1546 - 1554
  • [38] Application of Principal Component Analysis and Clustering to Spatial Allocation of Groundwater Contamination
    Wu, Ting-Nien
    Su, Chiu-Sheng
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 236 - 240
  • [39] A ROBUST FUZZY CLUSTERING APPROACH AND ITS APPLICATION TO PRINCIPAL COMPONENT ANALYSIS
    Yang, Ying-Kuei
    Lee, Chien-Nan
    Shieh, Horng-Lin
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2010, 16 (01): : 1 - 11
  • [40] Multivariate time series clustering based on common principal component analysis
    Li, Hailin
    NEUROCOMPUTING, 2019, 349 : 239 - 247