A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY

被引：0

作者：

Li, Hao-Di ^{[1
]}

Chen, Qing-Cai ^{[1
]}

Wang, Xiao-Long ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen Grad Sch, Comp Sci & Technol, Intelligent Comp Res Ctr, Harbin, Heilongjiang, Peoples R China

来源：

PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4 | 2013年

关键词：

Semantic similarity; Combination of rule and statistical measure; Sentence level semantic similarity;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid development of artificial intelligence and natural language processing, text similarity calculation has become the core module of many applications such as semantic disambiguation, information retrieval, automatic question answering and data mining etc. Most of the existing semantic similarity algorithms are based on statistical methods or rule based methods that are conducted on ontology dictionaries and some kind of knowledge bases. Wherein the rule-based methods usually use the dictionary, the ontology tree or graph, or the co-occurrence number of attributes, while the statistical methods may choose to use or not use a knowledge base. While a statistical method of using a knowledge base incorporates more comprehensive knowledge and has the capability of reduces knowledge noise, it usually obtains better performance. Nevertheless, due to the imbalanced distribution of different items in a knowledge base, the semantic similarity calculation results for low-frequency words are usually poor. To address above issue, this thesis presents a combined measure for semantic similarity calculation. At first, we studied existing statistical methods that are based on ontology dictionary rules and corpus and compared their advantages and disadvantages. Then the method of combing rules and statistical measures is proposed for word level semantic similarity calculation, which uses English and Chinese Wikipedia database and the HowNet semantic dictionary to build it. For the sentence level semantic similarity computation, the syntactic information, the edit distance and the semantic similarity are combined together to improve the performance. The combined calculation method proposed in this thesis is verified by experiments conducted on English and Chinese standard corpus and the best results among all the compared methods of the same kind.

引用

下载

页码：1869 / 1873

页数：5

共 50 条

[1] Text mining of bilingual parallel corpora with a measure of semantic similarity
Lee, CH
Yang, HC
2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 470 - 475
[2] A combined fuzzy semantic similarity measure in OWL ontologies
Cannella, Vincenzo
Russo, Giuseppe
Sangiorgi, Pierluca
Pirrone, Roberto
WEBIST 2008: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2008, : 181 - 186
[3] An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification
Albitar, Shereen
Fournier, Sebastien
Espinasse, Bernard
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 105 - 114
[4] A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
Xia, Haoxiang
Wang, Shuguang
Yoshida, Taketoshi
JOURNAL OF SYSTEMS SCIENCE AND SYSTEMS ENGINEERING, 2006, 15 (04) : 474 - 492
[5] A modified ant-based text clustering algorithm with semantic similarity measure
Haoxiang Xia
Shuguang Wang
Taketoshi Yoshida
Journal of Systems Science and Systems Engineering, 2006, 15 : 474 - 492
[6] SyMSS: A syntax-based measure for short-text semantic similarity
Oliva, Jesus
Ignacio Serrano, Jose
Dolores del Castillo, Maria
Iglesias, Angel
DATA & KNOWLEDGE ENGINEERING, 2011, 70 (04) : 390 - 405
[7] A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
Taketoshi YOSHIDA
Journal of Systems Science and Systems Engineering, 2006, (04) : 474 - 492
[8] Text Representation and Similarity Measure for Text Clustering Based on Semantic Strings: A Case Study on Uyghur Language
Tohti, Turdi
Tan, Xing
Huang, Jimmy
Hamdulla, Askar
JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2021, 24 (03): : 339 - 350
[9] A semantic similarity measure in the context of semantic queries
Blazquez-del-Toro, Jose M.
Arias Fisteus, Jesus
Luque Centeno, Vicente
Sanchez-Fernandez, Luis
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2008, 33 (04) : 285 - 291
[10] Semantic Textual Similarity in Bengali Text
Shajalal, Md
Aono, Masaki
2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,

← 1 2 3 4 5 →