An efficient algorithm for clustering XML schemas

被引:0
|
作者
Rhim, TW
Lee, KH
Ko, MC
机构
[1] Yonsei Univ, Dept Comp Sci, Sudaemoon Ku, Seoul 120749, South Korea
[2] Konkuk Univ, Dept Comp Sci, Chungju 380701, Chungbuk, South Korea
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Schema clustering is important as a prerequisite to the integration of XML schemas. This paper presents an efficient method for clustering XML schemas. The proposed method first computes similarities among schemas. The similarity is defined by the size of the common structure between two schemas under the assumption that the schemas with less cost to be integrated are more similar. Specifically, we extract one-to-one matchings between paths with the largest number of corresponding elements. Finally, a hierarchical clustering method is applied to the value of similarity. Experimental results with many XML schemas show that the method has performed better compared with previous works, resulting in a precision of 98% and a rate of clustering of 95% in average.
引用
收藏
页码:372 / 377
页数:6
相关论文
共 50 条
  • [1] Clustering of XML schemas for information integration
    Rhim, TW
    Lee, KH
    [J]. JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2005, 46 (02) : 3 - 13
  • [2] An approach for clustering semantically heterogeneous XML Schemas
    De Meo, P
    Quattrone, G
    Terracina, G
    Ursino, D
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2005: COOPIS, DOA, AND ODBASE, PT 1, PROCEEDINGS, 2005, 3760 : 329 - 346
  • [3] Efficient extraction of schemas for XML documents
    Min, JK
    Ahn, JY
    Chung, CW
    [J]. INFORMATION PROCESSING LETTERS, 2003, 85 (01) : 7 - 12
  • [4] Schemas for Safe and Efficient XML Processing
    Colazzo, Dario
    Ghelli, Giorgio
    Sartiani, Carlo
    [J]. IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1378 - +
  • [5] XStreamCluster: An Efficient Algorithm for Streaming XML Data Clustering
    Papapetrou, Odysseas
    Chen, Ling
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PT I, 2011, 6587 : 496 - +
  • [6] An efficient and scalable algorithm for clustering XML documents by structure
    Lian, W
    Cheung, DWL
    Mamoulis, N
    Yiu, SM
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (01) : 82 - 96
  • [7] Semantics-guided clustering of heterogeneous XML schemas
    De Meo, Pasquale
    Quattrone, Giovanni
    Terracina, Giorgio
    Ursino, Domenico
    [J]. JOURNAL ON DATA SEMANTICS IX, 2007, 4601 : 39 - +
  • [8] On Effective XML Clustering by Path Commonality: An Efficient and Scalable Algorithm
    Costa, Gianni
    Ortale, Riccardo
    [J]. 2012 IEEE 24TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2012), VOL 1, 2012, : 389 - 396
  • [9] A new inlining algorithm for mapping XML DTDs to relational schemas
    Lu, SY
    Sun, YZ
    Atay, M
    Fotouhi, F
    [J]. CONCEPTUAL MODELING FOR NOVEL APPLICATION DOMAINS, PROCEEDINGS, 2003, 2814 : 366 - 377
  • [10] Efficient inclusion checking for deterministic tree automata and XML schemas
    Champavere, Jerome
    Gilleron, Remi
    Lemay, Aurelien
    Niehren, Joachim
    [J]. INFORMATION AND COMPUTATION, 2009, 207 (11) : 1181 - 1208