Sentence similarity calculation method based on lexical, syntactic and semantic

被引：0

作者：

Zhai S. ^{[1
,2
]}

Li Z. ^{[1
]}

Duan H. ^{[1
]}

Li J. ^{[1
]}

Dong D. ^{[1
]}

机构：

[1] School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an

[2] Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an

来源：

Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition) | 2019年 / 49卷 / 06期

关键词：

Lexical layer; Ontology; Semantic layer; Sentence similarity; Syntactic layer;

D O I：

10.3969/j.issn.1001-0505.2019.06.011

中图分类号：

学科分类号：

摘要：

To solve the problem that the existing sentence similarity algorithms did not consider semantic information, a similarity computation method based on lexical, syntactic and semantic was proposed. The sentence similarities were divided into three levels, including the lexical layer, the syntactic layer and the semantic layer. In the lexical layer, the lexical similarity matrix and the digital sequence similarity matrix were constructed to calculate the similarity of the sentence. In the syntactic layer, the similarity of the sentence was calculated by the similarity of the resource description framework (RDF) triples converted from conceptual vocabularies. In the semantic layer, the semantic distance based on the shortest path representation in the ontology structure was used to calculate the semantic similarity. Then, the semantic similarity calculation model of sentences was proposed. The sentence pairs in the book domain were collected as the test sets, and the book ontology was constructed as the knowledge source. Experimental results show that the proposed method has higher accuracy and recall rate, and its F-measure reaches 0.649 9. Compared with the cosine similarity algorithm, the Levenshtein algorithm and the TF-IDF (term frequency-inverse document frequency) algorithm, the F-measures are increased by about 12%, 17% and 16%, respectively. © 2019, Editorial Department of Journal of Southeast University. All right reserved.

引用

页码：1094 / 1100

页数：6

共 15 条

[1] Zeng S., Wang S., Yuan Y., Et al., Towards knowledge automation: A survey on question answering systems, Acta Automatica Sinica, 43, 9, pp. 1491-1508, (2017)
[2] Wan F.Q., Wu Y.F., Computing lexical semantic relatedness with Chinesewikipedia, Journal of Chinese Information Processing, 27, 6, (2013)
[3] Kusner M.J., Sun Y., Kolkin N.I., Et al., From word embeddings to document distances, Proceedings of the 32nd International Conference on International Conference on Machine Learning, pp. 957-966, (2015)
[4] Deng H., Zhu X.H., Li Q., Et al., Sentence similarity calculation based on syntactic structure and modifier, Computer Engineering, 43, 9, (2017)
[5] Gokul P.P., Akhil B.K., Shiva K.K.M., Sentence similarity detection in Malayalam language using cosine similarity, 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 221-228, (2017)
[6] Luo C.J., Zhan J.F., Xue X.H., Et al., Cosine normalization: Using cosine similarity instead of dot product in neural networks, Artificial Neural Networks and Machine Learning, pp. 382-391, (2018)
[7] Novoselov S., Shchemelinin V., Shulipa A., Et al., Triplet loss based cosine similarity metric learning for text-independent speaker recognition, Interspeech 2018, (2018)
[8] Tasi C.S., Huang Y.M., Liu C.H., Et al., Applying VSM and LCS to develop an integrated text retrieval mechanism, Expert Systems with Applications, 39, 4, pp. 3974-3982, (2012)
[9] Tian X., Zheng J., Zhang Z.P., Jaccard text similarity algorithm based on word embedding, Computer Science, 45, 7, pp. 186-189, (2018)
[10] Zhao S.H., Li J.Y., Xu B.R., Et al., Improved tfidf-based question similarity algorithm for the community interlocution systems, Transactions of Beijing Institute of Technology, 37, 9, pp. 982-985, (2017)

← 1 2 →