XML clustering by principal component analysis

被引:0
|
作者
Liu, JH [1 ]
Wang, JTL [1 ]
Hsu, W [1 ]
Herbert, KG [1 ]
机构
[1] New Jersey Inst Technol, Coll Comp Sci, Newark, NJ 07102 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML is increasingly important in data exchange and information management. A large amount of efforts have been spent in developing efficient techniques for storing, querying, indexing and accessing XML documents. In this paper we propose a new approach to clustering XML data. In contrast to previous work, which focused on documents defined by different DTDs, the proposed method works for documents with the same DTD. Our approach is to extract features from documents, modeled by ordered labeled trees, and transform the documents to vectors in a high-dimensional Euclidean space based on the occurrences of the features in the documents. We then reduce the dimensionality of the vectors by principal component analysis (PCA) and cluster the vectors in the reduced dimensional space. The PCA enables one to identify vectors with co-occurrent features, thereby enhancing the accuracy of the clustering. Experimental results based on documents obtained from Wisconsin's XML data bank show the effectiveness and good performance of the proposed techniques.
引用
收藏
页码:658 / 662
页数:5
相关论文
共 50 条
  • [41] BUILDING THE KNOWLEDGE ECONOMIES: PRINCIPAL COMPONENT ANALYSIS AND CLUSTERING OF THE EU COUNTRIES
    Fucec, Adela Anca
    PROCEEDINGS OF THE 8TH INTERNATIONAL MANAGEMENT CONFERENCE: MANAGEMENT CHALLENGES FOR SUSTAINABLE DEVELOPMENT, 2014, : 772 - 779
  • [42] An assessment of climatological synoptic typing by principal component analysis and kmeans clustering
    Cuell, Charles
    Bonsal, Barrie
    THEORETICAL AND APPLIED CLIMATOLOGY, 2009, 98 (3-4) : 361 - 373
  • [43] Classification of Clothing Products Based on Principal Component Analysis and Clustering Algorithm
    Zheng, Jia-zhou
    3RD INTERNATIONAL CONFERENCE ON ECONOMICS AND MANAGEMENT (ICEM 2016), 2016, : 427 - 432
  • [44] Analysis of electricity consumption behaviors based on principal component analysis and density peak clustering
    Yang, Qin
    Yin, Shihao
    Li, Qingpeng
    Li, Yongping
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (21):
  • [45] Fault isolation in nonlinear systems with structured partial principal component analysis and clustering analysis
    Huang, YB
    McAvoy, TJ
    Gertler, J
    CANADIAN JOURNAL OF CHEMICAL ENGINEERING, 2000, 78 (03): : 569 - 577
  • [46] Regionalization of Precipitation Regimes in Iran Using Principal Component Analysis and Hierarchical Clustering Analysis
    Darand, Mohammad
    Daneshvar, Mohammad Reza Mansouri
    ENVIRONMENTAL PROCESSES-AN INTERNATIONAL JOURNAL, 2014, 1 (04): : 517 - 532
  • [47] Analysis of electricity consumption behaviors based on principal component analysis and density peak clustering
    Yang, Qin
    Yin, Shihao
    Li, Qingpeng
    Li, Yongping
    Concurrency and Computation: Practice and Experience, 2022, 34 (21)
  • [48] Principal Component Projection Without Principal Component Analysis
    Frostig, Roy
    Musco, Cameron
    Musco, Christopher
    Sidford, Aaron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [49] Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis
    Lin, Nan
    Jiang, Junhai
    Guo, Shicheng
    Xiong, Momiao
    PLOS ONE, 2015, 10 (07):
  • [50] Regionalization of Precipitation Regimes in Iran Using Principal Component Analysis and Hierarchical Clustering Analysis
    Mohammad Darand
    Mohammad Reza Mansouri Daneshvar
    Environmental Processes, 2014, 1 (4) : 517 - 532