Measuring semantic similarity between words by removing noise and redundancy in web snippets

被引:20
|
作者
Xu, Zheng [1 ]
Luo, Xiangfeng [1 ]
Yu, Jie [1 ]
Xu, Weimin [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Ctr High Performance Comp, Shanghai 200072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
semantic similarity; information retrieval; query suggestion; Web search;
D O I
10.1002/cpe.1816
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets ('Web-snippet' includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:2496 / 2510
页数:15
相关论文
共 50 条
  • [21] A Laplacian Eigenmaps Based Semantic Similarity Measure between Words
    Wu, Yuming
    Cao, Cungen
    Wang, Shi
    Wang, Dongsheng
    INTELLIGENT INFORMATION PROCESSING V, 2010, 340 : 291 - 296
  • [22] Lexicographic Study of Synonymy: Clarifying Semantic Similarity between Words
    Gimaletdinova, Gulnara
    Khalitova, Liliia
    Solovyev, Valery
    Bochkarev, Vladimir
    COMPUTACION Y SISTEMAS, 2021, 25 (03): : 667 - 675
  • [23] Hybrid approach for semantic similarity calculation between Tamil words
    Karuppaiah D.
    Durai Raj Vincent P.M.
    International Journal of Innovative Computing and Applications, 2021, 12 (01) : 13 - 23
  • [24] Algorithmic Approach for Removing the Redundancy in Diabetic Gene Categories Based on Semantic Similarity and Gene Expression Data
    Atul Kumar
    D. Jeya Sundara Sharmila
    Interdisciplinary Sciences: Computational Life Sciences, 2016, 8 : 162 - 168
  • [25] An efficient technique for finding semantic similarity and their frequency between words
    Yadav, Sonu
    Sain, Deepak
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 159 - 163
  • [26] Algorithmic Approach for Removing the Redundancy in Diabetic Gene Categories Based on Semantic Similarity and Gene Expression Data
    Kumar, Atul
    Sharmila, D. Jeya Sundara
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2016, 8 (02) : 162 - 168
  • [27] Novel Approach to Find Semantic Similarity Measure between Words
    Sahni, Lakshay
    Sehgal, Anubhav
    Kochar, Shaivi
    Ahmad, Faiyaz
    Ahmad, Tanvir
    PROCEEDINGS OF 2014 2ND INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2014, : 89 - 92
  • [28] Measuring Semantic Similarity between Concepts in Visual Domain
    Wang, Zhiyong
    Guan, Genliang
    Wang, Jiajun
    Feng, Dagan
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 632 - +
  • [29] Measuring semantic similarity between Gene Ontology terms
    Couto, Francisco M.
    Silva, Mario J.
    Coutinho, Pedro M.
    DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) : 137 - 152
  • [30] Measuring semantic similarity between geospatial conceptual regions
    Schwering, A
    Raubal, M
    GEOSPATIAL SEMANTICS, PROCEEDINGS, 2005, 3799 : 90 - 106