Reconciling schemas of disparate data sources: A machine-learning approach

被引:0
|
作者
Doan, AH [1 ]
Domingos, P [1 ]
Halevy, A [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that employs and extends current machine-learning techniques to semi-automatically find such mappings. LSD first asks the user to provide the semantic mappings for a small set of data sources, then uses these mappings together with the sources to train a set of learners. Each learner exploits a different type of information either in the source schemas or in their data. Once the learners have been trained, LSD finds semantic mappings for a new data source by applying the learners, then combining their predictions using a meta-learner. To further improve matching accuracy, we extend machine learning techniques so that LSD can incorporate domain constraints as:an additional source of knowledge, and develop a novel learner that utilizes the structural information in XML documents. Our approach thus is distinguished in that it incorporates multiple types of knowledge. Importantly, its architecture is extensible to additional learners that may exploit new kinds of information. We describe a set of experiments on several real-world domains, and show that LSD proposes semantic mappings with a high degree of accuracy.
引用
收藏
页码:509 / 520
页数:12
相关论文
共 50 条
  • [31] Machine-Learning Methods on Noisy and Sparse Data
    Poulinakis, Konstantinos
    Drikakis, Dimitris
    Kokkinakis, Ioannis W.
    Spottswood, Stephen Michael
    MATHEMATICS, 2023, 11 (01)
  • [32] Machine-learning approach identifies wolfcamp reservoirs
    Carpenter C.
    JPT, Journal of Petroleum Technology, 2019, 71 (03): : 87 - 89
  • [33] Machine-learning approach to holographic particle characterization
    1600, OSA - The Optical Society (22):
  • [34] A machine-learning approach to predict postprandial hypoglycemia
    Wonju Seo
    You-Bin Lee
    Seunghyun Lee
    Sang-Man Jin
    Sung-Min Park
    BMC Medical Informatics and Decision Making, 19
  • [35] Machine-Learning based IoT Data Caching
    Pahl, Marc-Oliver
    Liebald, Stefan
    Wuestrich, Lars
    2019 IFIP/IEEE SYMPOSIUM ON INTEGRATED NETWORK AND SERVICE MANAGEMENT (IM), 2019,
  • [36] Machine-learning classifiers for imbalanced tornado data
    Trafalis T.B.
    Adrianto I.
    Richman M.B.
    Lakshmivarahan S.
    Computational Management Science, 2014, 11 (4) : 403 - 418
  • [37] Machine-learning techniques for macromolecular crystallization data
    Gopalakrishnan, V
    Livingston, G
    Hennessy, D
    Buchanan, B
    Rosenberg, JM
    ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2004, 60 : 1705 - 1716
  • [38] Machine-Learning Metacomputing for Materials Science Data
    Steuben, J.C.
    Geltmacher, A.B.
    Rodriguez, S.N.
    Birnbaum, A.J.
    Graber, B.D.
    Rawlings, A.K.
    Iliopoulos, A.P.
    Michopoulos, J.G.
    Journal of Computing and Information Science in Engineering, 2024, 24 (11)
  • [39] MULTIPHYSICS MISSING DATA SYNTHESIS (MiDaS): A MACHINE-LEARNING APPROACH FOR MITIGATING DATA GAPS & ARTIFACTS
    Steuben, J. C.
    Geltmacher, A. B.
    Rodriguez, S. N.
    Graber, B. D.
    Iliopoulos, A. P.
    Michopoulos, J. G.
    PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 2, 2023,
  • [40] Muography background sources: simulation, characterization, and machine-learning rejection
    Pena-Rodriguez, J.
    de'Leon-Barrios, R.
    Ramirez-Munoz, A.
    Villabona-Ardila, D.
    Suarez-Duran, M.
    Vasquez-Ramirez, A.
    Asorey, H.
    Nunez, L. A.
    37TH INTERNATIONAL COSMIC RAY CONFERENCE, ICRC2021, 2022,