Weakly supervised topic sentiment joint model with word embeddings

被引:28
|
作者
Fu, Xianghua [1 ]
Sun, Xudong [1 ]
Wu, Haiying [1 ]
Cui, Laizhong [1 ]
Huang, Joshua Zhexue [1 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
关键词
Sentiment analysis; Topic model; Topic sentiment joint model; Word embeddings;
D O I
10.1016/j.knosys.2018.02.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Topic sentiment joint model aims to deal with the problem about the mixture of topics and sentiment simultaneously from online reviews. Most of existing topic sentiment modeling algorithms are mainly based on the state-of-art latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA), which infer sentiment and topic distributions from the co-occurrence of words. These methods have been proposed and successfully used for topic and sentiment analysis. However, when the training corpus is small or when the documents are short, the textual features become sparse, so that the results of the sentiment and topic distributions might be not very satisfied. In this paper, we propose a novel topic sentiment joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embeddings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition. The main contributions of WS-TSWE include the following two aspects. (1) Existing models generate the words only from the sentiment-topic-to-word Dirichlet multinomial component, but the WS-TSWE model replaces it with a mixture of two components, a Dirichlet multinomial component and a word embeddings component. Since the word embeddings are trained on a very large corpora and can be used to extend the semantic information of the words, they can provide a certain solution for the problem of the textual sparse. (2) Most of previous models incorporate sentiment knowledge in the beta priors. And the priors are usually set from a dictionary and completely rely on previous domain knowledge to identify positive and negative words. In contrast, the WS-TSWE model calculates the sentiment orientation of each word with the HowNet lexicon and automatically infers sentiment-based beta priors for sentiment analysis and opinion mining. Furthermore, we implement WS-TSWE with Gibbs sampling algorithms. The experimental results on Chinese and English data sets show that WS-TSWE achieved significant performance in the task of detecting sentiment and topics simultaneously. (c) 2018 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:43 / 54
页数:12
相关论文
共 50 条
  • [31] Clustering Search Engine Suggests by Integrating a Topic Model and Word Embeddings
    Nie, Tian
    Ding, Yi
    Zhao, Chen
    Lin, Youchao
    Utsuro, Takchito
    Kawada, Yasuhide
    2017 18TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNDP 2017), 2017, : 581 - 586
  • [32] An improved sentiment classification model based on data quality and word embeddings
    Asma Siagh
    Fatima Zohra Laallam
    Okba Kazar
    Hajer Salem
    The Journal of Supercomputing, 2023, 79 : 11871 - 11894
  • [33] An improved sentiment classification model based on data quality and word embeddings
    Siagh, Asma
    Laallam, Fatima Zohra
    Kazar, Okba
    Salem, Hajer
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (11): : 11871 - 11894
  • [34] A Semi-Supervised Topic Model Incorporating Sentiment and Dynamic Characteristic
    Lanshan Zhang
    Xi Ding
    Ye Tian
    Xiangyang Gong
    Wendong Wang
    China Communications, 2016, 13 (12) : 162 - 175
  • [35] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    COMPUTER JOURNAL, 2019, 62 (03): : 359 - 372
  • [36] Sentiment and Context-refined Word Embeddings for Sentiment Analysis
    Deniz, Ayca
    Angin, Merih
    Angin, Pelin
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 927 - 932
  • [37] A Semi-Supervised Topic Model Incorporating Sentiment and Dynamic Characteristic
    Zhang, Lanshan
    Ding, Xi
    Tian, Ye
    Gong, Xiangyang
    Wang, Wendong
    CHINA COMMUNICATIONS, 2016, 13 (12) : 162 - 175
  • [38] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    Computer Journal, 2019, 62 (03): : 359 - 372
  • [39] Improving Twitter Sentiment Classification Using Topic-Enriched Multi-Prototype Word Embeddings
    Ren, Yafeng
    Zhang, Yue
    Zhang, Meishan
    Ji, Donghong
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 3038 - 3044
  • [40] Joint Sentiment Part Topic Regression Model for Multimodal Analysis
    Li, Mengyao
    Zhu, Yonghua
    Gao, Wenjing
    Cao, Meng
    Wang, Shaoxiu
    INFORMATION, 2020, 11 (10) : 1 - 16