Improving text relatedness by incorporating phrase relatedness with word relatedness

被引:2
|
作者
Rakib, Rashadul Hasan [1 ]
Islam, Aminul [2 ]
Milios, Evangelos [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5, Canada
[2] Univ Louisiana Lafayette, Sch Comp & Informat, Lafayette, LA 70504 USA
关键词
semantic relatedness; semantic similarity; text mining; text relatedness; text similarity; SIMILARITY; MODELS;
D O I
10.1111/coin.12152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text is composed of words and phrases. In the bag-of-words model, phrases in text are split into words. This may discard the semantics of phrases, which, in turn, may give an inconsistent relatedness score between 2 texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve text relatedness performance. We adopt 2 existing word relatedness measures based on Google n-gram and Global Vectors for Word Representation, respectively, and incorporate them differently with an existing Google n-gram-based phrase relatedness method to compute text relatedness. The combination of Google n-gram-based word and phrase relatedness performs better than Google n-gram-based word relatedness alone, by achieving the higher weighted mean of Pearson's r, ie, 0.639 and 0.619, respectively, on the 14 data sets from the series of Semantic Evaluation workshops SemEval-2012, SemEval-2013, and SemEval-2015. Similarly, the combination of GloVe-based word relatedness and Google n-gram-based phrase relatedness performs better than GloVe-based word relatedness alone, by achieving the higher weighted mean of Pearson's r, ie, 0.619 and 0.605, respectively, on the same 14 data sets. On the SemEval-2012, SemEval-2013, and SemEval-2015 data sets, the text relatedness results obtained from the combination of Google n-gram-based word and phrase relatedness ranked 24, 3, and 31 out of 89, 90, and 73 text relatedness systems, respectively.
引用
收藏
页码:939 / 966
页数:28
相关论文
共 50 条
  • [1] Text Relatedness Based on a Word Thesaurus
    Tsatsaronis, George
    Varlamis, Iraklis
    Vazirgiannis, Michalis
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2010, 37 : 1 - 39
  • [2] WORD DECOMPOSITION AND SEMANTIC RELATEDNESS
    DOLBY, JL
    [J]. STATISTICAL METHODS IN LINGUISTICS, 1970, (06): : 15 - 22
  • [3] High Performance Computational Framework for Phrase Relatedness
    Ai, Zichu
    Mei, Jie
    Moh'd, Abidalrahman
    Zeh, Norbert
    He, Meng
    Milios, Evangelos
    [J]. PROCEEDINGS OF THE 2017 ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 17), 2017, : 145 - 148
  • [4] Relatedness and the Resource Curse: Is There a Liability of Relatedness?
    Fitjar, Rune Dahl
    Timmermans, Bram
    [J]. ECONOMIC GEOGRAPHY, 2019, 95 (03) : 231 - 255
  • [5] A RATING SCALE MEASURE OF WORD RELATEDNESS
    GENTILE, JR
    SEIBEL, R
    [J]. JOURNAL OF VERBAL LEARNING AND VERBAL BEHAVIOR, 1969, 8 (02): : 252 - &
  • [6] RELATEDNESS
    GIBSON, PAK
    [J]. NATURE, 1976, 264 (5584) : 381 - 381
  • [7] Gamma band response in a word relatedness task
    Taylor, Grantley W.
    Salisbury, Dean F.
    [J]. PSYCHOPHYSIOLOGY, 2008, 45 : S114 - S114
  • [8] ANOTHER WORD ON LEXICAL DATA AND GENETIC RELATEDNESS
    SALMONS, J
    [J]. JOURNAL OF INDO-EUROPEAN STUDIES, 1987, 15 (3-4): : 381 - 384
  • [9] Indra: A Word Embedding and Semantic Relatedness Server
    Sales, Juliano Efson
    Souza, Leonardo
    Barzegar, Siamak
    Davis, Brian
    Freitas, Andre
    Handschuh, Siegfried
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1326 - 1332
  • [10] EFFECTS OF SEMANTIC RELATEDNESS ON TEXT-PROCESSING
    VANOOSTENDORP, H
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1992, 27 (3-4) : 68 - 68