Exploiting semantic relationships for unsupervised expansion of sentiment lexicons

被引:10
|
作者
Viegas, Felipe [1 ]
Alvim, Mario S. [1 ]
Canuto, Sergio [1 ]
Rosa, Thierson [3 ]
Goncalves, Marcos Andre [1 ]
Rocha, Leonardo [2 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Univ Fed Sao Joao Del Rei, Dept Comp Sci, Sao Joao Del Rei, Brazil
[3] Univ Fed Goias, Inst Informat, Goiania, Go, Brazil
关键词
Sentiment analysis; Lexicon dictionary; Word embeddings; Lexicon expansion; WORDS;
D O I
10.1016/j.is.2020.101606
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption stems from the fact that words considered to be "close" in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage. We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performed competitively against SOTA supervised sentiment analysis methods, mainly in scenarios with scarce information. Finally, in a cross-dataset comparison, our approach turned out to be as competitive as (i.e., statistically tie with) state-of-the-art supervised solutions such as pre-trained transformers (BERT), even without relying on any training (labeled) data. Indeed in small datasets or in datasets with scarce information (short messages), our solution outperformed the supervised ones by large margins. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Cross-Domain Contextualization of Sentiment Lexicons
    Gindl, Stefan
    Weichselbraun, Albert
    Scharl, Arno
    [J]. ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 771 - 776
  • [42] An experimental evaluation of prior polarities in sentiment lexicons
    Kanburoglu, Ali Bugra
    Solak, Ercan
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 389 - 392
  • [43] Semantic lexicons of English nouns for classification
    Vo Ngoc Phu
    Vo Thi Ngoc Tran
    Vo Thi Ngoc Chau
    Dat Nguyen Duy
    Khanh Ly Doan Duy
    [J]. Evolving Systems, 2019, 10 : 501 - 565
  • [44] Exploiting noun phrases and semantic relationships for text document clustering
    Zheng, Hai-Tao
    Kang, Bo-Yeong
    Kim, Hong-Gee
    [J]. INFORMATION SCIENCES, 2009, 179 (13) : 2249 - 2262
  • [45] Facial action unit recognition by exploiting their dynamic and semantic relationships
    Tong, Yan
    Liao, Wenhui
    Ji, Qiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (10) : 1683 - 1699
  • [46] Exploiting syntactic and semantic relationships between terms for opinion retrieval
    Guo, Liqiang
    Wan, Xiaojun
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (11): : 2269 - 2282
  • [47] Exploiting label semantic relatedness for unsupervised image annotation with large free vocabularies
    Pellegrin, Luis
    Escalante, Hugo Jair
    Montes-y-Gomez, Manuel
    Gonzalez, Fabio A.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (14) : 19641 - 19662
  • [48] Exploiting label semantic relatedness for unsupervised image annotation with large free vocabularies
    Luis Pellegrin
    Hugo Jair Escalante
    Manuel Montes-y-Gómez
    Fabio A. González
    [J]. Multimedia Tools and Applications, 2019, 78 : 19641 - 19662
  • [49] Reducing the cost of validating mapping compositions by exploiting semantic relationships
    Dragut, Eduard
    Lawrence, Ramon
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: COOPIS, DOA, GADA, AND ODBAS, PT 1, PROCEEDINGS, 2006, 4275 : 882 - 890
  • [50] Semantic lexicons for accessing legal information
    Sagri, MT
    Tiscornia, D
    [J]. ELECTRONIC GOVERNMENT, PROCEEDINGS, 2004, 3183 : 72 - 81