Text Relatedness Based on a Word Thesaurus

被引:86
|
作者
Tsatsaronis, George [1 ,3 ]
Varlamis, Iraklis [2 ]
Vazirgiannis, Michalis [3 ]
机构
[1] Norwegian Univ Sci & Technol, Dept Comp & Informat Sci, Trondheim, Norway
[2] Harokopio Univ, Dept Informat & Telemat, Athens, Greece
[3] Athens Univ Econ & Business, Dept Informat, Athens, Greece
关键词
SIMILARITY; WIKIPEDIA;
D O I
10.1613/jair.2880
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.
引用
收藏
页码:1 / 39
页数:39
相关论文
共 50 条
  • [41] Text Similarity Function Based on Word Embeddings for Short Text Analysis
    Pascual, Adrian Jimenez
    Fujita, Sumio
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2017), PT I, 2018, 10761 : 391 - 402
  • [42] WORD-FREQUENCY AND ITS PLACE IN A THESAURUS
    SHEKHTMAN, NA
    [J]. NAUCHNO-TEKHNICHESKAYA INFORMATSIYA SERIYA 2-INFORMATSIONNYE PROTSESSY I SISTEMY, 1978, (05): : 20 - 21
  • [43] The word association test in the methodology of thesaurus construction
    Nielsen, ML
    [J]. ADVANCES IN CLASSIFICATION RESEARCH, VOL 8, 1998, : 43 - 58
  • [44] Wikipedia-Based Relatedness Measurements for Multilingual Short Text Clustering
    Nakamura, Tatsuya
    Shirakawa, Masumi
    Hara, Takahiro
    Nishio, Shojiro
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (02)
  • [45] Multiple label text categorization on a hierarchical thesaurus
    Ribadas, Francisco J.
    Lloves, Erica
    Darriba, Victor M.
    [J]. COMPUTER AIDED SYSTEMS THEORY- EUROCAST 2007, 2007, 4739 : 297 - +
  • [46] Automatic thesaurus for enhanced Chinese text retrieval
    Foo, S
    Hui, SC
    Lim, HK
    Hui, L
    [J]. LIBRARY COMPUTING, 2000, 19 (3-4): : 270 - 280
  • [47] THESAURUS MODELLING OF THE "TEXT LINGUISTICS" TERMINOLOGICAL FIELD
    Zhuchkova, Irina Igorevna
    [J]. VESTNIK VOLGOGRADSKOGO GOSUDARSTVENNOGO UNIVERSITETA-SERIYA 2-YAZYKOZNANIE, 2014, 13 (02): : 53 - 59
  • [48] Using Thesaurus to Improve Multiclass Text Classification
    Maghsoodi, Nooshin
    Homayounpour, Mohammad Mehdi
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT II, 2011, 6609 : 244 - 253
  • [49] Enhancing Text-Based Relatedness Measures with Semantic Web Data
    Gjorgjevikj, Ana
    Stojanov, Riste
    Trajanov, Dimitar
    [J]. ICT INNOVATIONS 2016: COGNITIVE FUNCTIONS AND NEXT GENERATION ICT SYSTEMS, 2018, 665 : 182 - 192
  • [50] A RATING SCALE MEASURE OF WORD RELATEDNESS
    GENTILE, JR
    SEIBEL, R
    [J]. JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1969, 8 (02): : 252 - &