Applications of corpus-based semantic similarity and word segmentation to database schema matching

被引:0
|
作者
Aminul Islam
Diana Inkpen
Iluju Kiringa
机构
[1] University of Ottawa,School of Information Technology and Engineering
来源
The VLDB Journal | 2008年 / 17卷
关键词
Database schema matching; Semantic similarity; Word segmentation; Corpus-based methods;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we present a method for database schema matching: the problem of identifying elements of two given schemas that correspond to each other. Schema matching is useful in e-commerce exchanges, in data integration/warehousing, and in semantic web applications. We first present two corpus-based methods: one method is for determining the semantic similarity of two target words and the other is for automatic word segmentation. Then we present a name-based element-level database schema matching method that exploits both the semantic similarity and the word segmentation methods. Our word similarity method uses pointwise mutual information (PMI) to sort lists of important neighbor words of two target words; the words which are common in both lists are selected and their PMI values are aggregated to calculate the relative similarity score. Our word segmentation method uses corpus type frequency information to choose the type with maximum length and frequency from “desegmented” text. It also uses a modified forward–backward matching technique using maximum length frequency and entropy rate if any non-matching portions of the text exist. Finally, we exploit both the semantic similarity and the word segmentation methods in our proposed name-based element-level schema matching method. This method uses a single property (i.e., element name) for schema matching and nevertheless achieves a measure score that is comparable to the methods that use multiple properties (e.g., element name, text description, data instance, context description). Our schema matching method also uses normalized and modified versions of the longest common subsequence string matching algorithm with weight factors to allow for a balanced combination. We validate our methods with experimental studies, the results of which suggest that these methods can be a useful addition to the set of existing methods.
引用
收藏
页码:1293 / 1320
页数:27
相关论文
共 50 条
  • [21] A CORPUS-BASED STUDY ON THE SEMANTIC PROSODY OF CHALLENGE
    Lin, Yen-Yu
    Chung, Siaw-Fong
    [J]. TAIWAN JOURNAL OF TESOL, 2016, 13 (02): : 99 - 146
  • [22] The Giver: A Corpus-Based Analysis of Word Frequencies
    Brandenburg-Weeks, Tara
    Abalkheel, Albatool Mohammed
    [J]. 3L-LANGUAGE LINGUISTICS LITERATURE-THE SOUTHEAST ASIAN JOURNAL OF ENGLISH LANGUAGE STUDIES, 2021, 27 (03): : 215 - 227
  • [23] The Dynamics of Semantic Change: A Corpus-Based Analysis
    Boukhaled, Mohamed Amine
    Fagard, Benjamin
    Poibeau, Thierry
    [J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2019, 2019, 11978 : 1 - 15
  • [24] Corpus-based identification and refinement of semantic classes
    Nazarenko, A
    Zweigenbaum, P
    Bouaud, J
    Habert, B
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1997, : 585 - 589
  • [25] Corpus-based Learning of Analogies and Semantic Relations
    Peter D. Turney
    Michael L. Littman
    [J]. Machine Learning, 2005, 60 : 251 - 278
  • [26] A Corpus-based Approach to the Semantic Prosody of DOG
    周美芝
    [J]. 海外英语, 2012, (04) : 273 - 274
  • [27] Measurement of word similarity based on Corpus
    Zhang Zhiling
    Yu Liqun
    Luo Haifei
    Shao Xiaomin
    [J]. Proceedings of the 24th Chinese Control Conference, Vols 1 and 2, 2005, : 1297 - 1301
  • [28] Developing a Large Benchmark Corpus for Urdu Semantic Word Similarity
    Muneer, Iqra
    Fatima, Ghazeefa
    Khan, Muhammad Salman
    Nawab, Rao Muhammad Adeel
    Saeed, Ali
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [29] Semantic Schema Matching for String Attribute with Word Vectors and its Evaluation
    Kenji Nozaki
    Teruhisa Hochin
    Hiroki Nomiya
    [J]. International Journal of Networked and Distributed Computing, 2019, 7 : 100 - 106
  • [30] Semantic Schema Matching for String Attribute with Word Vectors and its Evaluation
    Nozaki, Kenji
    Hochin, Teruhisa
    Nomiya, Hiroki
    [J]. INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2019, 7 (03) : 100 - 106