Exploiting semantic relationships for unsupervised expansion of sentiment lexicons

被引:10
|
作者
Viegas, Felipe [1 ]
Alvim, Mario S. [1 ]
Canuto, Sergio [1 ]
Rosa, Thierson [3 ]
Goncalves, Marcos Andre [1 ]
Rocha, Leonardo [2 ]
机构
[1] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[2] Univ Fed Sao Joao Del Rei, Dept Comp Sci, Sao Joao Del Rei, Brazil
[3] Univ Fed Goias, Inst Informat, Goiania, Go, Brazil
关键词
Sentiment analysis; Lexicon dictionary; Word embeddings; Lexicon expansion; WORDS;
D O I
10.1016/j.is.2020.101606
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The literature in sentiment analysis has widely assumed that semantic relationships between words cannot be effectively exploited to produce satisfactory sentiment lexicon expansions. This assumption stems from the fact that words considered to be "close" in a semantic space (e.g., word embeddings) may present completely opposite polarities, which might suggest that sentiment information in such spaces is either too faint, or at least not readily exploitable. Our main contribution in this paper is a rigorous and robust challenge to this assumption: by proposing a set of theoretical hypotheses and corroborating them with strong experimental evidence, we demonstrate that semantic relationships can be effectively used for good lexicon expansion. Based on these results, our second contribution is a novel, simple, and yet effective lexicon-expansion strategy based on semantic relationships extracted from word embeddings. This strategy is able to substantially enhance the lexicons, whilst overcoming the major problem of lexicon coverage. We present an extensive experimental evaluation of sentence-level sentiment analysis, comparing our approach to sixteen state-of-the-art (SOTA) lexicon-based and five lexicon expansion methods, over twenty datasets. Results show that in the vast majority of cases our approach outperforms the alternatives, achieving coverage of almost 100% and gains of about 26% against the best baselines. Moreover, our unsupervised approach performed competitively against SOTA supervised sentiment analysis methods, mainly in scenarios with scarce information. Finally, in a cross-dataset comparison, our approach turned out to be as competitive as (i.e., statistically tie with) state-of-the-art supervised solutions such as pre-trained transformers (BERT), even without relying on any training (labeled) data. Indeed in small datasets or in datasets with scarce information (short messages), our solution outperformed the supervised ones by large margins. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Morphological Relations for the Automatic Expansion of Italian Sentiment Lexicons
    Pelosi, Serena
    [J]. AUTOMATIC PROCESSING OF NATURAL-LANGUAGE ELECTRONIC TEXTS WITH NOOJ, 2016, 607 : 41 - 51
  • [2] Unsupervised Fine-Grained Sentiment Analysis System Using Lexicons and Concepts
    Ofek, Nir
    Rokach, Lior
    [J]. SEMANTIC WEB EVALUATION CHALLENGE, 2014, 475 : 28 - 33
  • [3] A semantic similarity-based perspective of affect lexicons for sentiment analysis
    Araque, Oscar
    Zhu, Ganggao
    Iglesias, Carlos A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 165 : 346 - 359
  • [4] Sentiment Analysis: Arabic Sentiment Lexicons
    Sabra, Khaled S. i
    Zantout, Rached N.
    El Abed, Mohamad A.
    Hamandi, Lama
    [J]. 2017 SENSORS NETWORKS SMART AND EMERGING TECHNOLOGIES (SENSET), 2017,
  • [5] Exploiting the Semantic Web for Unsupervised Natural Language Semantic Parsing
    Tur, Gokhan
    Jeong, Minwoo
    Wang, Ye-Yi
    Hakkani-Tuer, Dilek
    Heck, Larry
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 338 - 341
  • [6] Review on Sentiment Lexicons
    Jagdale, Rajkumar S.
    Shirsat, Vishal S.
    Deshmukh, Sachin N.
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES 2018), 2018, : 1105 - 1110
  • [7] Exploiting strong syntactic heuristics and co-training to learn semantic lexicons
    Phillips, W
    Riloff, E
    [J]. PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2002, : 125 - 132
  • [8] Learning Ranked Sentiment Lexicons
    Peleja, Filipa
    Magalhaes, Joao
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT II, 2015, 9042 : 35 - 48
  • [9] EXPLOITING THE SEMANTIC WEB FOR UNSUPERVISED SPOKEN LANGUAGE UNDERSTANDING
    Heck, Larry
    Hakkani-Tuer, Dilek
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 228 - 233
  • [10] Relationships at the heart of semantic Web: Modeling, discovering, and exploiting complex semantic relationships
    Sheth, A
    Arpinar, IB
    Kashyap, V
    [J]. ENHANCING THE POWER OF THE INTERNET, 2004, 139 : 63 - 94