Estimating Sentiment via Probability and Information Theory

被引:9
|
作者
Labille, Kevin [1 ]
Alfarhood, Sultan [1 ]
Gauch, Susan [1 ]
机构
[1] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR 72701 USA
关键词
Lexicons; Sentiment Analysis; Data Mining; Text Mining; Opinion Mining;
D O I
10.5220/0006072101210129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Opinion detection and opinion analysis is a challenging but important task. Such sentiment analysis can be done using traditional supervised learning methods such as naive Bayes classification and support vector machines (SVM) or unsupervised approaches based on a lexicon may be employed. Because lexicon-based sentiment analysis methods make use of an opinion dictionary that is a list of opinion-bearing or sentiment words, sentiment lexicons play a key role. Our work focuses on the task of generating such a lexicon. We propose several novel methods to automatically generate a general-purpose sentiment lexicon using a corpus-based approach. While most existing methods generate a lexicon using a list of seed sentiment words and a domain corpus, our work differs from these by generating a lexicon from scratch using probabilistic techniques and information theoretical text mining techniques on a large diverse corpus. We conclude by presenting an ensemble method that combines the two approaches. We evaluate and demonstrate the effectiveness of our methods by utilizing the various automatically-generated lexicons during sentiment analysis. When used for sentiment analysis, our best single lexicon achieves an accuracy of 87.60% and the ensemble approach achieves 88.75% accuracy, both statistically significant improvements over 81.60% with a widely-used sentiment lexicon.
引用
收藏
页码:121 / 129
页数:9
相关论文
共 50 条
  • [21] Estimating a Rasch Model via Fuzzy Empirical Probability Functions
    Bertoli-Barsotti, Lucio
    Lando, Tommaso
    Punzo, Antonio
    ANALYSIS AND MODELING OF COMPLEX DATA IN BEHAVIORAL AND SOCIAL SCIENCES, 2014, : 29 - 36
  • [22] Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network
    Yang, Jufeng
    Sun, Ming
    Sun, Xiaoxiao
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 224 - 230
  • [23] Estimating mutual information via Kolmogorov distance
    Zhang, Zhengmin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2007, 53 (09) : 3280 - 3282
  • [24] Estimating Mutual Information via Geodesic kNN
    Marx, Alexander
    Fischer, Jonas
    PROCEEDINGS OF THE 2022 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2022, : 415 - 423
  • [25] Stance detection via sentiment information and neural network model
    Sun, Qingying
    Wang, Zhongqing
    Li, Shoushan
    Zhu, Qiaoming
    Zhou, Guodong
    FRONTIERS OF COMPUTER SCIENCE, 2019, 13 (01) : 127 - 138
  • [26] Stance detection via sentiment information and neural network model
    Qingying Sun
    Zhongqing Wang
    Shoushan Li
    Qiaoming Zhu
    Guodong Zhou
    Frontiers of Computer Science, 2019, 13 : 127 - 138
  • [27] High Probability Guarantees in Repeated Games: Theory and Applications in Information Theory
    Delgosha, Payam
    Gohari, Amin
    Akbarpour, Mohammad
    2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 1621 - 1625
  • [29] Information on the fourth "Kolmogorov students' competition on probability theory"
    Shiryaev, A. N.
    Cherny, A. S.
    Dilman, S. V.
    Kasatkin, S. E.
    Medvedev, I. N.
    Mishchenko, A. S.
    Murashov, A. I.
    Selivanov, A. V.
    Shashkin, A. P.
    Shavrova, N. V.
    Urusov, M. A.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2006, 50 (02) : 348 - 350
  • [30] CALCULATION OF PROBABILITY OF SPONTANEOUS BIOGENESIS BY INFORMATION-THEORY
    YOCKEY, HP
    JOURNAL OF THEORETICAL BIOLOGY, 1977, 67 (03) : 377 - 398