Measuring semantic similarity between words by removing noise and redundancy in web snippets

被引:20
|
作者
Xu, Zheng [1 ]
Luo, Xiangfeng [1 ]
Yu, Jie [1 ]
Xu, Weimin [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Ctr High Performance Comp, Shanghai 200072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
semantic similarity; information retrieval; query suggestion; Web search;
D O I
10.1002/cpe.1816
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets ('Web-snippet' includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:2496 / 2510
页数:15
相关论文
共 50 条
  • [41] AN APPROACH FOR MEASURING SEMANTIC RELATEDNESS BETWEEN WORDS VIA RELATED TERMS
    Salahli, Mehmet Ali
    MATHEMATICAL AND COMPUTATIONAL APPLICATIONS, 2009, 14 (01) : 55 - 63
  • [42] An approach for measuring semantic relatedness between words via related terms
    Department of Computer Engineering Canakkale, On Sekiz Mart University, 17100 Canakkale, Turkey
    Math Comput Appl, 2009, 1 (55-63):
  • [43] An approach to acquire semantic relationships between words from web document
    Sun, X
    Zheng, QH
    Dang, HF
    Hu, YH
    Bai, HX
    ADVANCES IN WEB-BASED LEARNING - ICWL 2005, 2005, 3583 : 236 - 243
  • [44] Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies
    Al-Mubaid, Hisham
    Nguyen, Hoa A.
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2009, 39 (04): : 389 - 398
  • [45] Measuring Semantic Similarity Between Sentences Using a Siamese Neural Network
    Ichida, Alexandre Yukio
    Meneguzzi, Felipe
    Ruiz, Duncan D.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [46] A New Model of Information Content for Measuring the Semantic Similarity Between Concepts
    Yuan, Qingbo
    Yu, Zhongqing
    Wang, Kaixi
    2013 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CLOUDCOM-ASIA), 2013, : 141 - 146
  • [47] Measuring similarity between trajectories using motion verbs in semantic level
    Cho, Miyoung
    Choi, Chang
    Kim, Pankoo
    9TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY: TOWARD NETWORK INNOVATION BEYOND EVOLUTION, VOLS 1-3, 2007, : 511 - +
  • [48] A Hybrid Approach for Measuring Semantic Similarity between Ontologies Based on WordNet
    He, Wei
    Yang, Xiaoping
    Huang, Dupei
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2011, 7091 : 68 - +
  • [49] A cross-cluster approach for measuring semantic similarity between concepts
    Ai-Mubaid, Hisham
    Nguyen, Hoa A.
    IRI 2006: PROCEEDINGS OF THE 2006 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2006, : 551 - +
  • [50] Unsupervised Semantic Similarity Computation between Terms Using Web Documents
    Iosif, Elias
    Potamianos, Alexandros
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (11) : 1637 - 1647