A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY

被引：0

作者：

Li, Hao-Di ^{[1
]}

Chen, Qing-Cai ^{[1
]}

Wang, Xiao-Long ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen Grad Sch, Comp Sci & Technol, Intelligent Comp Res Ctr, Harbin, Heilongjiang, Peoples R China

来源：

PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4 | 2013年

关键词：

Semantic similarity; Combination of rule and statistical measure; Sentence level semantic similarity;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the rapid development of artificial intelligence and natural language processing, text similarity calculation has become the core module of many applications such as semantic disambiguation, information retrieval, automatic question answering and data mining etc. Most of the existing semantic similarity algorithms are based on statistical methods or rule based methods that are conducted on ontology dictionaries and some kind of knowledge bases. Wherein the rule-based methods usually use the dictionary, the ontology tree or graph, or the co-occurrence number of attributes, while the statistical methods may choose to use or not use a knowledge base. While a statistical method of using a knowledge base incorporates more comprehensive knowledge and has the capability of reduces knowledge noise, it usually obtains better performance. Nevertheless, due to the imbalanced distribution of different items in a knowledge base, the semantic similarity calculation results for low-frequency words are usually poor. To address above issue, this thesis presents a combined measure for semantic similarity calculation. At first, we studied existing statistical methods that are based on ontology dictionary rules and corpus and compared their advantages and disadvantages. Then the method of combing rules and statistical measures is proposed for word level semantic similarity calculation, which uses English and Chinese Wikipedia database and the HowNet semantic dictionary to build it. For the sentence level semantic similarity computation, the syntactic information, the edit distance and the semantic similarity are combined together to improve the performance. The combined calculation method proposed in this thesis is verified by experiments conducted on English and Chinese standard corpus and the best results among all the compared methods of the same kind.

引用

页码：1869 / 1873

页数：5

共 50 条

[41] A Comment on "A Similarity Measure for Text Classification and Clustering"
Nagwani, Naresh Kumar
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2589 - 2590
[42] Consensus Similarity Measure for Short Text Clustering
Shin, Youhyun
Ahn, Yeonchan
Jeon, Heesik
Lee, Sang-goo
2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 264 - 268
[43] Short Text Semantic Similarity Measurement Approach Based on Semantic Network
Hameed, Naamah Hussien
Alimi, Adel M.
Sadiq, Ahmed T.
BAGHDAD SCIENCE JOURNAL, 2022, 19 (06) : 1581 - 1591
[44] Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents
Teng, Hao
Wang, Nan
Zhao, Hongyu
Hu, Yingtong
Jin, Haitao
JOURNAL OF INFORMETRICS, 2024, 18 (01)
[45] Semantic text similarity using corpus-based word similarity and string similarity
University of Ottawa
不详
ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
[46] Text Similarity Approach for SNOMED CT Primitive Concept Similarity Measure
Htun, Htet Htet
Sornlertlamvanich, Virach
2017 8TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2017,
[47] Semantic similarity and text summarization based novelty detection
Kumar, Sushil
Bhatia, Komal Kumar
SN APPLIED SCIENCES, 2020, 2 (03):
[48] Semantic similarity metric and its application in text classification
Zhang, Pei-ying
PROGRESS IN CIVIL ENGINEERING, PTS 1-4, 2012, 170-173 : 3711 - 3714
[49] Enhancing Text Clustering Performance Using Semantic Similarity
Gad, Walaa K.
Kamel, Mohamed S.
ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 325 - 335
[50] TEXT CONTENT ANALYSIS USING ONTOLOGY AND SEMANTIC SIMILARITY
Prodanovic, Dejan
Furlan, Bojan
Nikolic, Bosko
2014 22ND TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2014, : 1126 - 1129

← 1 2 3 4 5 →