Applications of corpus-based semantic similarity and word segmentation to database schema matching

被引:0
|
作者
Aminul Islam
Diana Inkpen
Iluju Kiringa
机构
[1] University of Ottawa,School of Information Technology and Engineering
来源
The VLDB Journal | 2008年 / 17卷
关键词
Database schema matching; Semantic similarity; Word segmentation; Corpus-based methods;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we present a method for database schema matching: the problem of identifying elements of two given schemas that correspond to each other. Schema matching is useful in e-commerce exchanges, in data integration/warehousing, and in semantic web applications. We first present two corpus-based methods: one method is for determining the semantic similarity of two target words and the other is for automatic word segmentation. Then we present a name-based element-level database schema matching method that exploits both the semantic similarity and the word segmentation methods. Our word similarity method uses pointwise mutual information (PMI) to sort lists of important neighbor words of two target words; the words which are common in both lists are selected and their PMI values are aggregated to calculate the relative similarity score. Our word segmentation method uses corpus type frequency information to choose the type with maximum length and frequency from “desegmented” text. It also uses a modified forward–backward matching technique using maximum length frequency and entropy rate if any non-matching portions of the text exist. Finally, we exploit both the semantic similarity and the word segmentation methods in our proposed name-based element-level schema matching method. This method uses a single property (i.e., element name) for schema matching and nevertheless achieves a measure score that is comparable to the methods that use multiple properties (e.g., element name, text description, data instance, context description). Our schema matching method also uses normalized and modified versions of the longest common subsequence string matching algorithm with weight factors to allow for a balanced combination. We validate our methods with experimental studies, the results of which suggest that these methods can be a useful addition to the set of existing methods.
引用
收藏
页码:1293 / 1320
页数:27
相关论文
共 50 条
  • [1] Applications of corpus-based semantic similarity and word segmentation to database schema matching
    Islam, Aminul
    Inkpen, Diana
    Kiringa, Iluju
    [J]. VLDB JOURNAL, 2008, 17 (05): : 1293 - 1320
  • [2] Corpus-based schema matching
    Madhavan, J
    Bernstein, PA
    Doan, A
    Halevy, A
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 57 - 68
  • [3] Semantic text similarity using corpus-based word similarity and string similarity
    University of Ottawa
    不详
    [J]. ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
  • [4] Integration of semantic networks for corpus-based word sense disambiguation
    Moon, YJ
    Min, KH
    Hwang, YH
    Kim, P
    [J]. LOGIC PROGRAMMING, PROCEEDINGS, 2003, 2916 : 492 - 493
  • [5] Improvement on Corpus-Based Word Similarity Using Vector Space Models
    Esin, Yunus Emre
    Alan, Oezguer
    Alpaslan, Ferda Nur
    [J]. 2009 24TH INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2009, : 279 - 284
  • [6] Pseudo Relevance Feedback Technique and Semantic Similarity for Corpus-based Expansion
    Mohd, Masnizah
    Atwan, Jaffar
    Shirai, Kiyoaki
    [J]. 2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 445 - 450
  • [7] Metonymy in the semantic field of verbal communication: A corpus-based analysis of WORD
    Adel, Annelle
    [J]. JOURNAL OF PRAGMATICS, 2014, 67 : 72 - 88
  • [8] Experiments on the use of corpus-based word BI-gram in Chinese word segmentation
    Xu, RF
    Yeung, D
    [J]. 1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4222 - 4227
  • [9] Semantic Schema Matching for String Attribute with Word Vectors
    Nozaki, Kenji
    Hochin, Teruhisa
    Nomiya, Hiroki
    [J]. 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE/INTELLIGENCE AND APPLIED INFORMATICS (CSII 2019), 2019, : 25 - 30
  • [10] Semantic-Similarity-Based Schema Matching for Management of Building Energy Data
    Pan, Zhiyu
    Pan, Guanchen
    Monti, Antonello
    [J]. ENERGIES, 2022, 15 (23)