A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY

被引:0
|
作者
Li, Hao-Di [1 ]
Chen, Qing-Cai [1 ]
Wang, Xiao-Long [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Comp Sci & Technol, Intelligent Comp Res Ctr, Harbin, Heilongjiang, Peoples R China
关键词
Semantic similarity; Combination of rule and statistical measure; Sentence level semantic similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid development of artificial intelligence and natural language processing, text similarity calculation has become the core module of many applications such as semantic disambiguation, information retrieval, automatic question answering and data mining etc. Most of the existing semantic similarity algorithms are based on statistical methods or rule based methods that are conducted on ontology dictionaries and some kind of knowledge bases. Wherein the rule-based methods usually use the dictionary, the ontology tree or graph, or the co-occurrence number of attributes, while the statistical methods may choose to use or not use a knowledge base. While a statistical method of using a knowledge base incorporates more comprehensive knowledge and has the capability of reduces knowledge noise, it usually obtains better performance. Nevertheless, due to the imbalanced distribution of different items in a knowledge base, the semantic similarity calculation results for low-frequency words are usually poor. To address above issue, this thesis presents a combined measure for semantic similarity calculation. At first, we studied existing statistical methods that are based on ontology dictionary rules and corpus and compared their advantages and disadvantages. Then the method of combing rules and statistical measures is proposed for word level semantic similarity calculation, which uses English and Chinese Wikipedia database and the HowNet semantic dictionary to build it. For the sentence level semantic similarity computation, the syntactic information, the edit distance and the semantic similarity are combined together to improve the performance. The combined calculation method proposed in this thesis is verified by experiments conducted on English and Chinese standard corpus and the best results among all the compared methods of the same kind.
引用
收藏
页码:1869 / 1873
页数:5
相关论文
共 50 条
  • [41] A Comment on "A Similarity Measure for Text Classification and Clustering"
    Nagwani, Naresh Kumar
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2589 - 2590
  • [42] Consensus Similarity Measure for Short Text Clustering
    Shin, Youhyun
    Ahn, Yeonchan
    Jeon, Heesik
    Lee, Sang-goo
    2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 264 - 268
  • [43] Short Text Semantic Similarity Measurement Approach Based on Semantic Network
    Hameed, Naamah Hussien
    Alimi, Adel M.
    Sadiq, Ahmed T.
    BAGHDAD SCIENCE JOURNAL, 2022, 19 (06) : 1581 - 1591
  • [44] Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents
    Teng, Hao
    Wang, Nan
    Zhao, Hongyu
    Hu, Yingtong
    Jin, Haitao
    JOURNAL OF INFORMETRICS, 2024, 18 (01)
  • [45] Semantic text similarity using corpus-based word similarity and string similarity
    University of Ottawa
    不详
    ACM Transactions on Knowledge Discovery from Data, 2008, 2 (02)
  • [46] Text Similarity Approach for SNOMED CT Primitive Concept Similarity Measure
    Htun, Htet Htet
    Sornlertlamvanich, Virach
    2017 8TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2017,
  • [47] Semantic similarity and text summarization based novelty detection
    Kumar, Sushil
    Bhatia, Komal Kumar
    SN APPLIED SCIENCES, 2020, 2 (03):
  • [48] Semantic similarity metric and its application in text classification
    Zhang, Pei-ying
    PROGRESS IN CIVIL ENGINEERING, PTS 1-4, 2012, 170-173 : 3711 - 3714
  • [49] Enhancing Text Clustering Performance Using Semantic Similarity
    Gad, Walaa K.
    Kamel, Mohamed S.
    ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 325 - 335
  • [50] TEXT CONTENT ANALYSIS USING ONTOLOGY AND SEMANTIC SIMILARITY
    Prodanovic, Dejan
    Furlan, Bojan
    Nikolic, Bosko
    2014 22ND TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2014, : 1126 - 1129