A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY

被引:0
|
作者
Li, Hao-Di [1 ]
Chen, Qing-Cai [1 ]
Wang, Xiao-Long [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Comp Sci & Technol, Intelligent Comp Res Ctr, Harbin, Heilongjiang, Peoples R China
关键词
Semantic similarity; Combination of rule and statistical measure; Sentence level semantic similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of artificial intelligence and natural language processing, text similarity calculation has become the core module of many applications such as semantic disambiguation, information retrieval, automatic question answering and data mining etc. Most of the existing semantic similarity algorithms are based on statistical methods or rule based methods that are conducted on ontology dictionaries and some kind of knowledge bases. Wherein the rule-based methods usually use the dictionary, the ontology tree or graph, or the co-occurrence number of attributes, while the statistical methods may choose to use or not use a knowledge base. While a statistical method of using a knowledge base incorporates more comprehensive knowledge and has the capability of reduces knowledge noise, it usually obtains better performance. Nevertheless, due to the imbalanced distribution of different items in a knowledge base, the semantic similarity calculation results for low-frequency words are usually poor. To address above issue, this thesis presents a combined measure for semantic similarity calculation. At first, we studied existing statistical methods that are based on ontology dictionary rules and corpus and compared their advantages and disadvantages. Then the method of combing rules and statistical measures is proposed for word level semantic similarity calculation, which uses English and Chinese Wikipedia database and the HowNet semantic dictionary to build it. For the sentence level semantic similarity computation, the syntactic information, the edit distance and the semantic similarity are combined together to improve the performance. The combined calculation method proposed in this thesis is verified by experiments conducted on English and Chinese standard corpus and the best results among all the compared methods of the same kind.
引用
下载
收藏
页码:1869 / 1873
页数:5
相关论文
共 50 条
  • [1] Text mining of bilingual parallel corpora with a measure of semantic similarity
    Lee, CH
    Yang, HC
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 470 - 475
  • [2] A combined fuzzy semantic similarity measure in OWL ontologies
    Cannella, Vincenzo
    Russo, Giuseppe
    Sangiorgi, Pierluca
    Pirrone, Roberto
    WEBIST 2008: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2008, : 181 - 186
  • [3] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
    Albitar, Shereen
    Fournier, Sebastien
    Espinasse, Bernard
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
  • [4] A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
    Xia, Haoxiang
    Wang, Shuguang
    Yoshida, Taketoshi
    JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2006, 15 (04) : 474 - 492
  • [5] A modified ant-based text clustering algorithm with semantic similarity measure
    Haoxiang Xia
    Shuguang Wang
    Taketoshi Yoshida
    Journal of Systems Science and Systems Engineering, 2006, 15 : 474 - 492
  • [6] SyMSS: A syntax-based measure for short-text semantic similarity
    Oliva, Jesus
    Ignacio Serrano, Jose
    Dolores del Castillo, Maria
    Iglesias, Angel
    DATA & KNOWLEDGE ENGINEERING, 2011, 70 (04) : 390 - 405
  • [7] A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
    Taketoshi YOSHIDA
    Journal of Systems Science and Systems Engineering, 2006, (04) : 474 - 492
  • [8] Text Representation and Similarity Measure for Text Clustering Based on Semantic Strings: A Case Study on Uyghur Language
    Tohti, Turdi
    Tan, Xing
    Huang, Jimmy
    Hamdulla, Askar
    JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2021, 24 (03): : 339 - 350
  • [9] A semantic similarity measure in the context of semantic queries
    Blazquez-del-Toro, Jose M.
    Arias Fisteus, Jesus
    Luque Centeno, Vicente
    Sanchez-Fernandez, Luis
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2008, 33 (04) : 285 - 291
  • [10] Semantic Textual Similarity in Bengali Text
    Shajalal, Md
    Aono, Masaki
    2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,