An Algorithm of Semantic Similarity Between Words Based on Word Single-meaning Embedding Model

被引:0
|
作者
Li X.-T. [1 ]
You S.-J. [1 ]
Chen W. [1 ]
机构
[1] China Mobile Research Institute, Beijing
来源
关键词
Semantic similarity; Tongyici Cilin; Word sense disambiguation (WSD); Word single-meaning embedding model; Word2vec;
D O I
10.16383/j.aas.c180312
中图分类号
学科分类号
摘要
We propose a novel algorithm of semantic similarity between words, based on our word single-meaning embedding model, to address the issue of existing word-embedding-based approaches that have low computation accuracy in polysemous words, nonadjacent words and synonyms. Differently from the existing word embedding models, each polysemous word is decomposed into a series of monosemous words in our model, and there is a one-to-one correspondence between a word meaning and a vector. First of all, the word sense disambiguation (WSD) of polysemous words in different contexts of the corpus is achieved with the help of the prior classification information contained in Tongyici Cilin. Then, the word single-meaning embeddings are learned from the processed corpus and realize the precise expression for each word meaning, and as far as we know, no existing word embedding model could complete this task. At last, two test words are decomposed into marked monosemous words according to the number of meaning and expanded with synonyms, and then semantic relatedness between words is computed based on the word single-meaning embedding model and Tongyici Cilin. The experimental results showed our method can significantly improve the computation accuracy of polysemous words, nonadjacent words and synonyms. Copyright © 2020 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:1654 / 1669
页数:15
相关论文
共 23 条
  • [1] Li Wen-Qing, Sun Xin, Zhang Chang-You, Feng Ye, A semantic similarity measure between ontological concepts, Acta Automatica Sinica, 38, 2, pp. 229-235, (2012)
  • [2] Mikolov T, Chen K, Corrado G, Dean J., Efficient estimation of word representations in vector space, (2013)
  • [3] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J., Distributed representations of words and phrases and their compositionality, Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 3111-3119, (2013)
  • [4] Banu A, Fatima S S, Khan K U R., A new ontology-based semantic similarity measure for concept's subsumed by multiple super concepts, International Journal of Web Applications, 6, 1, pp. 14-22, (2014)
  • [5] Meng L L, Gu J Z, Zhou Z L., A new model of information content based on concept's topology for measuring semantic similarity in WordNet, International Journal of Grid and Distributed Computing, 5, 3, pp. 81-94, (2012)
  • [6] Seddiqui M H, Aono M., Metric of intrinsic information content for measuring semantic similarity in an ontology, Proceedings of the 7th Asia-Pacific Conference on Conceptual Modelling, pp. 89-96, (2010)
  • [7] Sanchez D, Batet M, Isern D., Ontology-based information content computation, Knowledge-Based Systems, 24, 2, pp. 297-303, (2011)
  • [8] Sanchez D, Batet M, Isern D, Valls A., Ontology-based semantic similarity: A new feature-based approach, Expert Systems with Applications, 39, 8, pp. 7718-7728, (2012)
  • [9] Zadeh P D H, Reformat M Z., Feature-based similarity assessment in ontology using fuzzy set theory, Proceedings of the 2012 IEEE International Conference on Fuzzy Systems, pp. 1-7, (2012)
  • [10] Li M, Lang B, Wang J M., Compound concept semantic similarity calculation based on ontology and concept constitution features, Proceedings of the 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 226-233, (2015)