Measuring semantic similarity between words by removing noise and redundancy in web snippets

被引:20
|
作者
Xu, Zheng [1 ]
Luo, Xiangfeng [1 ]
Yu, Jie [1 ]
Xu, Weimin [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Ctr High Performance Comp, Shanghai 200072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
semantic similarity; information retrieval; query suggestion; Web search;
D O I
10.1002/cpe.1816
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets ('Web-snippet' includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:2496 / 2510
页数:15
相关论文
共 50 条
  • [1] Measuring Semantic Similarity between Words Using Web Documents
    Takale, Sheetal A.
    Nandgaonkar, Sushma S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2010, 1 (04) : 78 - 85
  • [2] A Survey on Semantic Similarity between Words in Semantic Web
    Ilakiya, P.
    Sumathi, M.
    Karthik, S.
    2012 INTERNATIONAL CONFERENCE ON RADAR, COMMUNICATION AND COMPUTING (ICRCC), 2012, : 213 - 216
  • [3] Measuring Semantic Similarity between Words Using Wikipedia
    Lu Zhiqiang
    Shao Werimin
    Yu Zhenhua
    WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 251 - +
  • [4] Measuring Semantic Similarity between Words Using HowNet
    Dai, Liuling
    Liu, Bin
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 601 - +
  • [5] An Integrated Approach for Measuring Semantic Similarity between Words and Sentences using Web Search Engine
    Adhikesavan, Kavitha
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (06) : 589 - 596
  • [6] A fuzzy approach for measuring the semantic similarity between words in WordNet
    Song, Ling
    Ma, Jun
    Lei, Jingsheng
    Li, Chao
    Journal of Information and Computational Science, 2009, 6 (03): : 1673 - 1680
  • [7] Efficient Search Engine Approach for Measuring Similarity between words Using Page Count and Snippets
    Murugesan, P.
    Malathi, K.
    PROCEEDINGS OF 2015 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2015,
  • [8] Measuring semantic similarity between words using multiple information sources
    Lei, Jingsheng
    Journal of Information and Computational Science, 2010, 7 (02): : 601 - 608
  • [9] An Ontology-Based Approach for Measuring Semantic Similarity Between Words
    Zhang, Ruiling
    Xiong, Shengwu
    Chen, Zhong
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III, 2015, 9227 : 510 - 516
  • [10] Measuring Semantic Similarity between Words Based on Multiple Relational Information
    Duan, Jianyong
    Wu, Yuwei
    Wu, Mingli
    Wang, Hao
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (01) : 163 - 169