Reconciling schemas of disparate data sources: A machine-learning approach

被引:0
|
作者
Doan, AH [1 ]
Domingos, P [1 ]
Halevy, A [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A data-integration system provides access to a multitude of data sources through a single mediated schema. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the source schemas and the mediated schema. We describe LSD, a system that employs and extends current machine-learning techniques to semi-automatically find such mappings. LSD first asks the user to provide the semantic mappings for a small set of data sources, then uses these mappings together with the sources to train a set of learners. Each learner exploits a different type of information either in the source schemas or in their data. Once the learners have been trained, LSD finds semantic mappings for a new data source by applying the learners, then combining their predictions using a meta-learner. To further improve matching accuracy, we extend machine learning techniques so that LSD can incorporate domain constraints as:an additional source of knowledge, and develop a novel learner that utilizes the structural information in XML documents. Our approach thus is distinguished in that it incorporates multiple types of knowledge. Importantly, its architecture is extensible to additional learners that may exploit new kinds of information. We describe a set of experiments on several real-world domains, and show that LSD proposes semantic mappings with a high degree of accuracy.
引用
收藏
页码:509 / 520
页数:12
相关论文
共 50 条
  • [1] Learning to Match the Schemas of Data Sources: A Multistrategy Approach
    AnHai Doan
    Pedro Domingos
    Alon Halevy
    Machine Learning, 2003, 50 : 279 - 301
  • [2] Learning to match the schemas of data sources: A multistrategy approach
    Doan, A
    Domingos, P
    Halevy, A
    MACHINE LEARNING, 2003, 50 (03) : 279 - 301
  • [3] A Machine-learning Approach for Classifying and Categorizing Android Sources and Sinks
    Rasthofer, Siegfried
    Arzt, Steven
    Bodden, Eric
    21ST ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2014), 2014,
  • [4] Machine-Learning Approach to Analysis of Driving Simulation Data
    Yoshizawa, Akira
    Nishiyama, Hiroyuki
    Iwasaki, Hirotoshi
    Mizoguchi, Fumio
    2016 IEEE 15TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2016, : 398 - 402
  • [5] Drug repositioning: a machine-learning approach through data integration
    Francesco Napolitano
    Yan Zhao
    Vânia M Moreira
    Roberto Tagliaferri
    Juha Kere
    Mauro D’Amato
    Dario Greco
    Journal of Cheminformatics, 5
  • [6] Simplifying the interpretation of steroid metabolome data by a machine-learning approach
    Kirkgoz, Tarik
    Kilic, Semih
    Abali, Zehra Yavas
    Yaman, Ali
    Kaygusuz, Sare Betul
    Eltan, Mehmet
    Turan, Serap
    Haklar, Goncagul
    Sagiroglu, Mahmut Samil
    Bereket, Abdullah
    Guran, Tulay
    HORMONE RESEARCH IN PAEDIATRICS, 2019, 91 : 128 - 128
  • [7] Drug repositioning: a machine-learning approach through data integration
    Napolitano, Francesco
    Zhao, Yan
    Moreira, Vania M.
    Tagliaferri, Roberto
    Kere, Juha
    D'Amato, Mauro
    Greco, Dario
    JOURNAL OF CHEMINFORMATICS, 2013, 5
  • [8] A hybrid machine-learning approach for segmentation of protein localization data
    Kasson, PM
    Huppa, JB
    Davis, MM
    Brunger, AT
    BIOINFORMATICS, 2005, 21 (19) : 3778 - 3786
  • [9] A Machine-Learning Approach to Time Discrimination
    Hansen, Peter
    2010 IEEE NUCLEAR SCIENCE SYMPOSIUM CONFERENCE RECORD (NSS/MIC), 2010, : 2132 - 2133
  • [10] A machine-learning approach to automatic detection of delimiters in tabular data files
    Saurav, Shitesh
    Schwarz, Peter
    PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 1501 - 1503