A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY

被引:0
|
作者
Li, Hao-Di [1 ]
Chen, Qing-Cai [1 ]
Wang, Xiao-Long [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Comp Sci & Technol, Intelligent Comp Res Ctr, Harbin, Heilongjiang, Peoples R China
关键词
Semantic similarity; Combination of rule and statistical measure; Sentence level semantic similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of artificial intelligence and natural language processing, text similarity calculation has become the core module of many applications such as semantic disambiguation, information retrieval, automatic question answering and data mining etc. Most of the existing semantic similarity algorithms are based on statistical methods or rule based methods that are conducted on ontology dictionaries and some kind of knowledge bases. Wherein the rule-based methods usually use the dictionary, the ontology tree or graph, or the co-occurrence number of attributes, while the statistical methods may choose to use or not use a knowledge base. While a statistical method of using a knowledge base incorporates more comprehensive knowledge and has the capability of reduces knowledge noise, it usually obtains better performance. Nevertheless, due to the imbalanced distribution of different items in a knowledge base, the semantic similarity calculation results for low-frequency words are usually poor. To address above issue, this thesis presents a combined measure for semantic similarity calculation. At first, we studied existing statistical methods that are based on ontology dictionary rules and corpus and compared their advantages and disadvantages. Then the method of combing rules and statistical measures is proposed for word level semantic similarity calculation, which uses English and Chinese Wikipedia database and the HowNet semantic dictionary to build it. For the sentence level semantic similarity computation, the syntactic information, the edit distance and the semantic similarity are combined together to improve the performance. The combined calculation method proposed in this thesis is verified by experiments conducted on English and Chinese standard corpus and the best results among all the compared methods of the same kind.
引用
下载
收藏
页码:1869 / 1873
页数:5
相关论文
共 50 条
  • [31] An efficient method to measure the semantic similarity of ontologies
    Wang, James Z.
    Ali, Farha
    Srimani, Pradip K.
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2008, 5036 : 447 - 458
  • [32] Measure Semantic Similarity between English Words
    Hu, Jinwu
    Dai, Liuling
    Liu, Bin
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1689 - +
  • [33] A Semantic and Syntactic Similarity Measure for Political Tweets
    Little, Claire
    Mclean, David
    Crockett, Keeley
    Edmonds, Bruce
    IEEE ACCESS, 2020, 8 : 154095 - 154113
  • [34] IWD towards Semantic similarity measure in ontology
    Rathee, Preeti
    Malik, Sanjay Kumar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (07): : 1561 - 1577
  • [35] On fuzzy semantic similarity measure for DNA coding
    Ahmad, Muneer
    Jung, Low Tang
    Bhuiyan, Md Al-Amin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2016, 69 : 144 - 151
  • [36] An Improved Semantic Similarity Measure for Word Pairs
    Cai, Songmei
    Lu, Zhao
    2010 INTERNATIONAL CONFERENCE ON E-EDUCATION, E-BUSINESS, E-MANAGEMENT AND E-LEARNING: IC4E 2010, PROCEEDINGS, 2010, : 212 - 216
  • [37] News Summarization Based on Semantic Similarity Measure
    Yu, Hui
    HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 180 - 183
  • [38] A Text Similarity Measure Based on Suffix Tree
    Huang, Chenghui
    Liu, Yan
    Xia, Shengzhong
    Yin, Jian
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (02): : 583 - 592
  • [39] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [40] An improved Similarity Measure For Chinese Text Clustering
    Zhang, Shaolei
    Wang, Zhong
    Huang, Wei
    2016 2ND INTERNATIONAL CONFERENCE ON MECHANICAL, ELECTRONIC AND INFORMATION TECHNOLOGY ENGINEERING (ICMITE 2016), 2016, : 141 - 144