Sentence similarity based on semantic nets and corpus statistics

被引:443
|
作者
Li, Yuhua [1 ]
McLean, David
Bandar, Zuhair A.
O'Shea, James D.
Crockett, Keeley
机构
[1] Univ Ulster, Sch Comp & Intelligent Syst, Coleraine BT48 7JL, Londonderry, North Ireland
[2] Manchester Metropolitan Univ, Dept Comp & Math, Manchester M1 5GD, Lancs, England
关键词
sentence similarity; semantic nets; corpus; natural language processing; word similarity;
D O I
10.1109/TKDE.2006.130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require human input, and are not adaptable to some application domains. This paper focuses directly on computing the similarity between very short texts of sentence length. It presents an algorithm that takes account of semantic information and word order information implied in the sentences. The semantic similarity of two sentences is calculated using information from a structured lexical database and from corpus statistics. The use of a lexical database enables our method to model human common sense knowledge and the incorporation of corpus statistics allows our method to be adaptable to different domains. The proposed method can be used in a variety of applications that involve text knowledge representation and discovery. Experiments on two sets of selected sentence pairs demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition.
引用
收藏
页码:1138 / 1150
页数:13
相关论文
共 50 条
  • [21] Using Sentence Semantic Similarity Based on WordNet in Recognizing Textual Entailment
    Castillo, Julio J.
    Cardenas, Marina E.
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 366 - 375
  • [22] SIMILARITY MEASURES BASED ON SENTENCE SEMANTIC STRUCTURE FOR RECOGNIZING PARAPHRASE AND ENTAILMENT
    Liu, Xiao-Ying
    Ren, Chuan-Lun
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1601 - 1607
  • [23] Short Tamil Sentence Similarity Calculation using Knowledge-Based and Corpus-Based Similarity Measures
    Selvarasa, Anutharsha
    Thirunavukkarasu, Nilasini
    Rajendran, Niveathika
    Yogalingam, Chinthoorie
    Ranathunga, Surangika
    Dias, Gihan
    2017 3RD INTERNATIONAL MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2017, : 443 - 448
  • [24] Semantic similarity, predictability, and models of sentence processing
    Roland, Douglas
    Yun, Hongoak
    Koenig, Jean-Pierre
    Mauner, Gail
    COGNITION, 2012, 122 (03) : 267 - 279
  • [25] A frequency enhanced algorithm of sentence semantic similarity
    Liao, Zhi-Fang
    Qiu, Li-Xia
    Xie, Yue-Shan
    Fan, Xiao-Ping
    Hunan Daxue Xuebao/Journal of Hunan University Natural Sciences, 2013, 40 (02): : 82 - 88
  • [26] FAST: A Fuzzy Semantic Sentence Similarity Measure
    Chandran, David
    Crockett, Keeley
    Mclean, David
    Bandar, Zuhair
    2013 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ - IEEE 2013), 2013,
  • [27] Semantic Word Error Rate For Sentence Similarity
    Spiccia, Carmelo
    Augello, Agnese
    Pilato, Giovanni
    Vassallo, Giorgio
    2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 265 - 268
  • [28] A Joint Model for Sentence Semantic Similarity Learning
    Wu, Di
    Huang, Jiuming
    Yang, Shuqiang
    2017 13TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2017), 2017, : 120 - 125
  • [29] Interpretable Semantic Textual Similarity for Indonesian Sentence
    Rajagukguk, Rio Chandra
    Khodra, Masayu Leylia
    2018 5TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATICS: CONCEPTS, THEORY AND APPLICATIONS (ICAICTA 2018), 2018, : 147 - 152
  • [30] Sentence Semantic Similarity Using Dependency Parsing
    Vakare, Tanmay
    Verma, Kshitij
    Jain, Vedant
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,