An Improved String Similarity Measure Based on Combining Information-Theoretic and Edit Distance Methods

被引:0
|
作者
Thi Thuy Anh Nguyen [1 ]
Conrad, Stefan [1 ]
机构
[1] Univ Dusseldorf, Inst Comp Sci, Univ Str 1, D-40225 Dusseldorf, Germany
关键词
Information-theoretic model; Feature-based measure; String-based measure; Similarity;
D O I
10.1007/978-3-319-25840-9_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The lexical similarity measure is used for calculating the similarities between strings. Existing lexical-based methods usually base on either n-grams or Dice's approaches. These measures have a good performance and could be extended by adjusting the parameter. However, they do not return reasonable results in some situations where strings are quite similar or the sets of characters are the same but their positions are different. To deal with this problem, our paper presents an approach to improve a lexical-based measure based on both information-theoretic and edit distance measures. The proposed method is tested on a partial OAEI benchmark 2008. The results show that our approach has some prominent features compared to other lexical-based methods. It is also flexible clearly and convenient in implementation. Moreover, we chose a range of good parameters can be applied in different domains.
引用
收藏
页码:228 / 239
页数:12
相关论文
共 50 条
  • [1] Information-theoretic and set-theoretic similarity
    Cazzanti, Luca
    Gupta, Maya R.
    [J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, VOLS 1-6, PROCEEDINGS, 2006, : 1836 - +
  • [2] The Edit Distance as a Measure of Perceived Rhythmic Similarity
    Post, Olaf
    Toussaint, Godfried
    [J]. EMPIRICAL MUSICOLOGY REVIEW, 2011, 6 (03): : 164 - 179
  • [3] Design of a hybrid measure for image similarity: a statistical, algebraic, and information-theoretic approach
    Aljanabi, Mohammed Abdulameer
    Hussain, Zahir M.
    Shnain, Noor Abd Alrazak
    Lu, Song Feng
    [J]. EUROPEAN JOURNAL OF REMOTE SENSING, 2019, 52 (sup4) : 2 - 15
  • [4] AN INFORMATION-THEORETIC MEASURE OF TERM SPECIFICITY
    WONG, SKM
    YAO, YY
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1992, 43 (01): : 54 - 61
  • [5] Bounded Occurrence Edit Distance: A New Metric for String Similarity Joins with Edit Distance Constraints
    Komatsu, Tomoki
    Okuta, Ryosuke
    Narisawa, Kazuyuki
    Shinohara, Ayumi
    [J]. SOFSEM 2014: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2014, 8327 : 363 - 374
  • [6] Edit Distance Based Similarity Search of Heterogeneous Information Networks
    Lu, Jianhua
    Lu, Ningyun
    Ma, Sipei
    Zhang, Baili
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2018), PT II, 2018, 11030 : 195 - 202
  • [7] An information-theoretic approach to combining object models
    Kruppa, H
    Schiele, B
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2002, 39 (3-4) : 195 - 203
  • [8] Measuring structural similarity of semistructured data based on information-theoretic approaches
    Sven Helmer
    Nikolaus Augsten
    Michael Böhlen
    [J]. The VLDB Journal, 2012, 21 : 677 - 702
  • [9] Information-Theoretic and Statistical Methods of Failure Log Selection for Improved Diagnosis
    Tanwir, Sannad
    Prabhu, Sarvesh
    Flsiao, Michael
    Lingappan, Loganathan
    [J]. 2015 IEEE INTERNATIONAL TEST CONFERENCE (ITC), 2015,
  • [10] Measuring structural similarity of semistructured data based on information-theoretic approaches
    Helmer, Sven
    Augsten, Nikolaus
    Boehlen, Michael
    [J]. VLDB JOURNAL, 2012, 21 (05): : 677 - 702