Measuring semantic similarity between words by removing noise and redundancy in web snippets

被引:20
|
作者
Xu, Zheng [1 ]
Luo, Xiangfeng [1 ]
Yu, Jie [1 ]
Xu, Weimin [1 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Ctr High Performance Comp, Shanghai 200072, Peoples R China
来源
基金
美国国家科学基金会;
关键词
semantic similarity; information retrieval; query suggestion; Web search;
D O I
10.1002/cpe.1816
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets ('Web-snippet' includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods. Copyright (C) 2011 John Wiley & Sons, Ltd.
引用
下载
收藏
页码:2496 / 2510
页数:15
相关论文
共 50 条
  • [31] Semantic Similarity between Web Documents Using Ontology
    Chahal P.
    Singh Tomer M.
    Kumar S.
    Journal of The Institution of Engineers (India): Series B, 2018, 99 (3) : 293 - 300
  • [32] A METHOD FOR THE COMPUTATION OF THE SEMANTIC SIMILARITY AND RELATEDNESS BETWEEN NATURAL LANGUAGE WORDS
    Anisimov, A. V.
    Marchenko, O. O.
    Kysenko, V. K.
    CYBERNETICS AND SYSTEMS ANALYSIS, 2011, 47 (04) : 515 - 522
  • [33] Measuring similarity between transliterations against noise data
    Hsu, Chung-Chian
    Chen, Chien-Hsing
    Shih, Tien-Teng
    Chen, Chun-Kai
    ACM Transactions on Asian Language Information Processing, 2007, 6 (01):
  • [34] Leveraging Grammatical Roles for Measuring Semantic Similarity Between Texts
    Atabuzzaman, Md
    Shajalal, Md
    Ahmed, M. Elius
    Ibn Afjal, Masud
    Aono, Masaki
    IEEE ACCESS, 2021, 9 : 62972 - 62983
  • [35] TV navigation agent for measuring semantic similarity between programs
    Mizoguchi-Shimogori, Yumiko
    Nakamoto, Toshiaki
    Asakawa, Kazuma
    Nagano, Shinichi
    Inaba, Masumi
    Kawamura, Takahiro
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2007: OTM 2007 WORKSHOPS, PT 1, PROCEEDINGS, 2007, 4805 : 75 - 84
  • [36] Similarity of fMRI Activity Patterns in Left Perirhinal Cortex Reflects Semantic Similarity between Words
    Bruffaerts, Rose
    Dupont, Patrick
    Peeters, Ronald
    De Deyne, Simon
    Storms, Gerrit
    Vandenberghe, Rik
    JOURNAL OF NEUROSCIENCE, 2013, 33 (47): : 18597 - 18607
  • [37] Not Just Dissimilar, but Opposite An Algorithm for Measuring Similarity and Oppositeness between Words
    Jones, Dean J.
    Mansingh, Gunjan
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON DATA SCIENCE & ENGINEERING (ICDSE), 2016, : 20 - 25
  • [38] Measuring Taxonomic Similarity between Words Using Restrictive Context Matrices
    Wang, Shi
    Cao, Cungen
    Cao, Ya-nan
    Lu, Han
    Cao, Xinyu
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 4, PROCEEDINGS, 2008, : 193 - 197
  • [39] Measuring Peculiarity of Text Using Relation between Words on the Web
    Nakabayashi, Takeru
    Yumoto, Takayuki
    Nii, Manabu
    Takahashi, Yutaka
    Sumiya, Kazutoshi
    ROLE OF DIGITAL LIBRARIES IN A TIME OF GLOBAL CHANGE, 2010, 6102 : 112 - +
  • [40] A Methodology for E-Content Preparation using Semantic Similarity between Words
    Gopal, U. Nanda
    2012 INTERNATIONAL CONFERENCE ON RADAR, COMMUNICATION AND COMPUTING (ICRCC), 2012, : 235 - 238