An improved Chinese word semantic similarity algorithm based on CiLin

被引:0
|
作者
Li, Fei [1 ]
Zhu, Xinhua [1 ]
Chen, Hongchao [1 ]
Ma, Runcong [1 ]
Deng, Han [1 ]
机构
[1] Guangxi Key Lab. of Multi-Source Information Mining & Security and College of Computer Science & Information Technology, Guangxi Normal University, Guilin, China
来源
Journal of Information and Computational Science | 2015年 / 12卷 / 10期
关键词
Correlation methods;
D O I
10.12733/jics20106030
中图分类号
O212 [数理统计];
学科分类号
摘要
The CiLin is a famous semantic dictionary of Chinese synonyms; its structure and function are quite like the WordNet in English. This paper improves the existing algorithm of Chinese word semantic similarity based on CiLin, which integrates the word distance, the density of lowest common parent node and branch layer spacing. Firstly, the initial value of word semantic similarity is calculated through word distance, and then an adjusting parameter that depends on the lowest common parent node density n and the branch interval k is set to revise the initial value downward. Through the fourth root of an expression for the parameters k and n, the revision range of initial similarity can be limited below 16%, thus avoiding the unreasonable phenomenon that the word pairs with near distance have a low similarity because of a far branch interval. This method obtains an as high as 0.8464 value of Pearson correlation coefficient compared with artificial judgment for the word pair set of Miller & Charles. 1548-7741/Copyright © 2015 Binary Information Press
引用
收藏
页码:3799 / 3807
相关论文
共 50 条
  • [31] Sentence Semantic Similarity based on Word FiImbedding and WordNet
    Farouk, Mamdouh
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 33 - 37
  • [32] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (04): : 418 - 424
  • [33] Short texts semantic similarity based on word embeddings
    Babic, Karlo
    Martincic-Ipsic, Sanda
    Mestrovic, Ana
    Guerra, Francesco
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2019), 2019, : 27 - 33
  • [34] Improved fast algorithm for Chinese word segmentation
    Chen, Guilin
    Wang, Yongcheng
    Han, Kesong
    Wang, Gang
    2000, Sci Press (37):
  • [35] An improved Dijkstra algorithm in Chinese Word Segmentation
    Zhang Xueyan
    Xue Xiao
    Yang Shenggang
    Zhao Limei
    ITESS: 2008 PROCEEDINGS OF INFORMATION TECHNOLOGY AND ENVIRONMENTAL SYSTEM SCIENCES, PT 2, 2008, : 909 - 914
  • [36] Word Semantic Similarity based on document's title
    Hamani, Mohamed Said
    Maamri, Ramdane
    2013 24TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA 2013), 2013, : 43 - 47
  • [37] Semantic text similarity using corpus-based word similarity and string similarity
    University of Ottawa
    不详
    ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
  • [38] An Algorithm of Semantic Similarity Between Words Based on Word Single-meaning Embedding Model
    Li X.-T.
    You S.-J.
    Chen W.
    Zidonghua Xuebao/Acta Automatica Sinica, 2020, 46 (08): : 1654 - 1669
  • [39] Research and Application of News-text Similarity Algorithm based on Chinese word segmentation
    Guan, Wei
    Zhang, Pengzhou
    2013 3RD INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, COMMUNICATIONS AND NETWORKS (CECNET), 2013, : 484 - 487
  • [40] Research on Semantic Similarity Algorithm of Chinese Words in a Specified Domain
    Niu, Qinzhou
    Zhao, Xiang
    FUZZY SYSTEMS AND DATA MINING III (FSDM 2017), 2017, 299 : 285 - 294