Applications of corpus-based semantic similarity and word segmentation to database schema matching

被引:0
|
作者
Aminul Islam
Diana Inkpen
Iluju Kiringa
机构
[1] University of Ottawa,School of Information Technology and Engineering
来源
The VLDB Journal | 2008年 / 17卷
关键词
Database schema matching; Semantic similarity; Word segmentation; Corpus-based methods;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we present a method for database schema matching: the problem of identifying elements of two given schemas that correspond to each other. Schema matching is useful in e-commerce exchanges, in data integration/warehousing, and in semantic web applications. We first present two corpus-based methods: one method is for determining the semantic similarity of two target words and the other is for automatic word segmentation. Then we present a name-based element-level database schema matching method that exploits both the semantic similarity and the word segmentation methods. Our word similarity method uses pointwise mutual information (PMI) to sort lists of important neighbor words of two target words; the words which are common in both lists are selected and their PMI values are aggregated to calculate the relative similarity score. Our word segmentation method uses corpus type frequency information to choose the type with maximum length and frequency from “desegmented” text. It also uses a modified forward–backward matching technique using maximum length frequency and entropy rate if any non-matching portions of the text exist. Finally, we exploit both the semantic similarity and the word segmentation methods in our proposed name-based element-level schema matching method. This method uses a single property (i.e., element name) for schema matching and nevertheless achieves a measure score that is comparable to the methods that use multiple properties (e.g., element name, text description, data instance, context description). Our schema matching method also uses normalized and modified versions of the longest common subsequence string matching algorithm with weight factors to allow for a balanced combination. We validate our methods with experimental studies, the results of which suggest that these methods can be a useful addition to the set of existing methods.
引用
收藏
页码:1293 / 1320
页数:27
相关论文
共 50 条
  • [41] Strudel: A Corpus-Based Semantic Model Based on Properties and Types
    Baroni, Marco
    Murphy, Brian
    Barbu, Eduard
    Poesio, Massimo
    [J]. COGNITIVE SCIENCE, 2010, 34 (02) : 222 - 254
  • [42] A Corpus-based View of Semantic Prosody in Business English
    Li Zeying
    [J]. 2012 INTERNATIONAL CONFERENCE ON EDUCATION REFORM AND MANAGEMENT INNOVATION (ERMI 2012), VOL 5, 2013, : 293 - 298
  • [43] Using Semantic Similarity for Schema Matching of Semi-structured and Linked Data
    Kettouch, Mohamed Salah
    Luca, Cristina
    Hobbs, Mike
    Dascalu, Sergiu
    [J]. PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE INTERNET TECHNOLOGIES AND APPLICATIONS (ITA), 2017, : 128 - 133
  • [44] Using ontologies for measuring semantic similarity in data warehouse schema matching process
    Banek, M.
    Vrdoljak, B.
    Tjoa, A. M.
    [J]. CONTEL 2007: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS, 2007, : 227 - +
  • [45] Symbolic Segmentation: A Corpus-Based Analysis of Melodic Phrases
    Rodriguez-Lopez, Marcelo
    Volk, Anja
    [J]. SOUND, MUSIC, AND MOTION, 2014, 8905 : 548 - 557
  • [46] Corpus-based translation studies: research and applications
    Song, Hua
    [J]. PERSPECTIVES-STUDIES IN TRANSLATION THEORY AND PRACTICE, 2016, 24 (02): : 339 - 341
  • [47] Semantic Matching Based on Semantic Segmentation and Neighborhood Consensus
    Xu, Huaiyuan
    Chen, Xiaodong
    Cai, Huaiyu
    Wang, Yi
    Liang, Haitao
    Li, Haotian
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (10):
  • [48] Word-formation complexity: a learner corpus-based study
    Lyashevskaya, Olga
    Pyzhak, Julia
    Vinogradova, Olga
    [J]. RUSSIAN JOURNAL OF LINGUISTICS, 2022, 26 (02): : 471 - 492
  • [49] Behavioral profiles A corpus-based approach to cognitive semantic analysis
    Gries, Stefan Th.
    Divjak, Dagmar
    [J]. NEW DIRECTIONS IN COGNITIVE LINGUISTICS, 2009, 24 : 57 - 75
  • [50] Analysis on Semantic Prosody of 'mianzi' and 'lian': A Corpus-Based Study
    Gan, Yeechin
    [J]. CHINESE LEXICAL SEMANTICS (CLSW 2015), 2015, 9332 : 101 - 111