Text Relatedness Based on a Word Thesaurus

被引:86
|
作者
Tsatsaronis, George [1 ,3 ]
Varlamis, Iraklis [2 ]
Vazirgiannis, Michalis [3 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, Trondheim, Norway
[2] Harokopio Univ, Dept Informat & Telemat, Athens, Greece
[3] Athens Univ Econ & Business, Dept Informat, Athens, Greece
关键词
SIMILARITY; WIKIPEDIA;
D O I
10.1613/jair.2880
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.
引用
收藏
页码:1 / 39
页数:39
相关论文
共 50 条
  • [1] Omiotis: A Thesaurus-Based Measure of Text Relatedness
    Tsatsaronis, George
    Varlamis, Iraklis
    Vazirgiannis, Michalis
    Norvag, Kietil
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 742 - +
  • [2] Improving text relatedness by incorporating phrase relatedness with word relatedness
    Rakib, Rashadul Hasan
    Islam, Aminul
    Milios, Evangelos
    [J]. COMPUTATIONAL INTELLIGENCE, 2018, 34 (03) : 939 - 966
  • [3] Text categorization algorithms using semantic approaches, corpus-based thesaurus and Word Net
    Li, Cheng Hua
    Yang, Ju Cheng
    Park, Soon Cheol
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (01) : 765 - 772
  • [4] Word Similarity Computing Based on HowNet and Synonymy Thesaurus
    Nie, Hongmei
    Zhou, Jiaqing
    Wang, Hui
    Li, Minshuo
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2020, 1038 : 292 - 305
  • [5] Short text classification based on strong feature thesaurus
    Wang, Bing-kun
    Huang, Yong-feng
    Yang, Wan-xia
    Li, Xing
    [J]. JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2012, 13 (09): : 649 - 659
  • [6] Short text classification based on strong feature thesaurus
    Bing-kun Wang
    Yong-feng Huang
    Wan-xia Yang
    Xing Li
    [J]. Journal of Zhejiang University SCIENCE C, 2012, 13 : 649 - 659
  • [7] Short text model based on Strong feature thesaurus
    Lu, Wentao
    Huang, Yongfeng
    Li, Xing
    Zhang, Zhuo
    Li, Yingkun
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 620 - 625
  • [9] WORD FINDER - ELECTRONIC THESAURUS
    LASBO, P
    [J]. ONLINE REVIEW, 1988, 12 (01): : 59 - 61
  • [10] WORD FINDER - AN ELECTRONIC THESAURUS
    JUDY, JR
    [J]. ELECTRONIC LIBRARY, 1985, 3 (03): : 176 - 177