Measuring semantic similarity between words by removing noise and redundancy in web snippets

被引:20
|
作者
Xu, Zheng [1 ]
Luo, Xiangfeng [1 ]
Yu, Jie [1 ]
Xu, Weimin [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Ctr High Performance Comp, Shanghai 200072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
semantic similarity; information retrieval; query suggestion; Web search;
D O I
10.1002/cpe.1816
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets ('Web-snippet' includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:2496 / 2510
页数:15
相关论文
共 50 条
  • [11] Measuring semantic similarity between named entities by searching the web directory
    Liu, Iiahui
    BimbauM, Larry
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 461 - +
  • [12] An approach for measuring semantic similarity between words using multiple information sources
    Li, YH
    Bandar, ZA
    McLean, D
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) : 871 - 882
  • [13] Measuring semantic similarity between words using lexical knowledge and neural networks
    Li, YH
    Bandar, Z
    Mclean, D
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 111 - 116
  • [14] A Web Search Engine-Based Approach to Measure Semantic Similarity between Words
    Bollegala, Danushka
    Matsuo, Yutaka
    Ishizuka, Mitsuru
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (07) : 977 - 990
  • [15] A graph modeling of semantic similarity between words
    Alvarez, Marco A.
    Lim, SeungJin
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 355 - +
  • [16] Measure Semantic Similarity between English Words
    Hu, Jinwu
    Dai, Liuling
    Liu, Bin
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1689 - +
  • [17] Measuring Semantic Similarity Using Web Search Engine
    Shanmugapriya
    Latha, K.
    2013 INTERNATIONAL CONFERENCE ON ADVANCED NANOMATERIALS AND EMERGING ENGINEERING TECHNOLOGIES (ICANMEET), 2013, : 639 - 644
  • [18] Measuring Semantic Similarity Between Digital Forensics Terminologies Using Web Search Engines
    Karie, Nickson M.
    Venter, Hein S.
    2012 INFORMATION SECURITY FOR SOUTH AFRICA (ISSA), 2012,
  • [19] Measuring the Strength of the Semantic Relationship Between Words
    Stanchev, Lubornir
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2015, 24 (02)
  • [20] Semantic Relation between Words with the Web as Information Source
    Basu, Tanmay
    Murthy, C. A.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 267 - 272