Clustering schema elements for semantic integration of heterogeneous data sources

被引:1
|
作者
Zhao, HM [1 ]
Ram, S
机构
[1] Univ Wisconsin, Milwaukee, WI 53201 USA
[2] Univ Arizona, Eller Coll Business & Publ Adm, Tucson, AZ 85721 USA
关键词
attribute correspondence; cluster analysis; heterogeneous database integration; interschema relationship identification; schema correspondence; self-organizing map;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on a combination of features such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. An SOM prototype the authors have developed provides users with a visualization tool for display of clustering results as well as for incremental evaluation of candidate similar elements.
引用
收藏
页码:88 / 106
页数:19
相关论文
共 50 条
  • [1] Semantic integration of schema conforming XML data sources
    Theodoratos, D
    Dalamagas, T
    Liu, IT
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2005, 2005, 3806 : 588 - 589
  • [2] An approach for semantic integration of heterogeneous data sources
    Fusco, Giuseppe
    Aversano, Lerina
    [J]. PEERJ COMPUTER SCIENCE, 2020, PeerJ Inc. (2020): : 1 - 30
  • [3] Semantic integration of XML heterogeneous data sources
    Reynaud, C
    Sirot, JP
    Vodislav, D
    [J]. 2001 INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2001, : 199 - 208
  • [4] Semantic integration of heterogeneous XML data sources
    Kim, HH
    Park, SS
    [J]. OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2002, 2425 : 95 - 107
  • [5] Semantic Integration of Heterogeneous Data Sources in the MOMIS Data Transformation System
    Vincini, Maurizio
    Beneventano, Domenico
    Bergamaschi, Sonia
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2013, 19 (13) : 1986 - 2012
  • [6] Research on Semantic Integration across Heterogeneous Data Sources in Grid
    Liu, Guofeng
    Huang, Shaobin
    Cheng, Yuan
    [J]. FRONTIERS IN COMPUTER EDUCATION, 2012, 133 : 397 - 404
  • [7] Integration of Heterogeneous Data Sources in Smart Grid based on Summary Schema Model
    Sedighi, Foroogh
    Moghadam, Mahshid Helali
    [J]. PROCEEDINGS OF THE 2016 12TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2016, : 88 - 93
  • [8] Schema Integration on Massive Data Sources
    Li, Tianbao
    Guo, Haifeng
    Yang, Donghua
    Li, Mengmeng
    Zheng, Bo
    Wang, Hongzhi
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2023, PT II, 2024, 14488 : 186 - 206
  • [9] Semantic integration of heterogeneous information sources
    Bergamaschi, S
    Castano, S
    Vincini, M
    Beneventano, D
    [J]. DATA & KNOWLEDGE ENGINEERING, 2001, 36 (03) : 215 - 249
  • [10] Semantic integration and querying of heterogeneous data sources using a hypergraph data model
    Theodoratos, D
    [J]. ADVANCES IN DATABASES, 2002, 2405 : 166 - 182