XML clustering by principal component analysis

被引:0
|
作者
Liu, JH [1 ]
Wang, JTL [1 ]
Hsu, W [1 ]
Herbert, KG [1 ]
机构
[1] New Jersey Inst Technol, Coll Comp Sci, Newark, NJ 07102 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques.
引用
收藏
页码:658 / 662
页数:5
相关论文
共 50 条
  • [21] A nonlinear extension of principal component analysis for clustering and spatial differentiation
    Sudjianto, A
    Wasserman, GS
    IIE TRANSACTIONS, 1996, 28 (12) : 1023 - 1028
  • [22] Principal component clustering approach to teaching quality discriminant analysis
    Xian, Sidong
    Xia, Haibo
    Yin, Yubo
    Zhai, Zhansheng
    Shang, Yan
    COGENT EDUCATION, 2016, 3
  • [23] Gender classification based on fuzzy clustering and principal component analysis
    Hassanpour, Hamid
    Zehtabian, Amin
    Nazari, Avishan
    Dehghan, Hossein
    IET COMPUTER VISION, 2016, 10 (03) : 228 - 233
  • [24] Clustering and feature selection using sparse principal component analysis
    Luss, Ronny
    d'Aspremont, Alexandre
    OPTIMIZATION AND ENGINEERING, 2010, 11 (01) : 145 - 157
  • [25] Principal component analysis of galaxy clustering in hyperspace of galaxy properties
    Zhou, Shuren
    Zhang, Pengjie
    Chen, Ziyang
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2023, 523 (04) : 5789 - 5798
  • [26] Facial Clustering Model upon Principal Component Analysis Databases
    Lee, Wookey
    Park, Simon Soon-Hyoung
    Afshar, Jafar
    Baek, Jongtae
    2017 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2017, : 1003 - 1007
  • [27] Nonlinear extension of principal component analysis for clustering and spatial differentiation
    Sudjianto, Agus
    Wasserman, Gary S.
    IIE Transactions (Institute of Industrial Engineers), 1996, 28 (12): : 1023 - 1028
  • [28] Use of principal component analysis and hierarchical clustering analysis to evaluate fingerprint residues
    Thomas, Robert
    Kuhns, Teresa
    Zentz, Stephanie
    Egolf, Debra
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [29] Fault detection of flywheel system based on clustering and principal component analysis
    Wang Rixin
    Gong Xuebing
    Xu Minqiang
    Li Yuqing
    Chinese Journal of Aeronautics, 2015, (06) : 1676 - 1688
  • [30] Image Clustering Based on Graph Regularized Robust Principal Component Analysis
    Jiang, Yan
    Liang, Wei
    Tang, Mingdong
    Xie, Yong
    Tang, Jintian
    BLOCKCHAIN AND TRUSTWORTHY SYSTEMS, BLOCKSYS 2019, 2020, 1156 : 563 - 573