Estimating Sentiment via Probability and Information Theory

被引:9
|
作者
Labille, Kevin [1 ]
Alfarhood, Sultan [1 ]
Gauch, Susan [1 ]
机构
[1] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR 72701 USA
关键词
Lexicons; Sentiment Analysis; Data Mining; Text Mining; Opinion Mining;
D O I
10.5220/0006072101210129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Opinion detection and opinion analysis is a challenging but important task. Such sentiment analysis can be done using traditional supervised learning methods such as naive Bayes classification and support vector machines (SVM) or unsupervised approaches based on a lexicon may be employed. Because lexicon-based sentiment analysis methods make use of an opinion dictionary that is a list of opinion-bearing or sentiment words, sentiment lexicons play a key role. Our work focuses on the task of generating such a lexicon. We propose several novel methods to automatically generate a general-purpose sentiment lexicon using a corpus-based approach. While most existing methods generate a lexicon using a list of seed sentiment words and a domain corpus, our work differs from these by generating a lexicon from scratch using probabilistic techniques and information theoretical text mining techniques on a large diverse corpus. We conclude by presenting an ensemble method that combines the two approaches. We evaluate and demonstrate the effectiveness of our methods by utilizing the various automatically-generated lexicons during sentiment analysis. When used for sentiment analysis, our best single lexicon achieves an accuracy of 87.60% and the ensemble approach achieves 88.75% accuracy, both statistically significant improvements over 81.60% with a widely-used sentiment lexicon.
引用
收藏
页码:121 / 129
页数:9
相关论文
共 50 条
  • [31] INFORMATION ON THE EIGHTH "KOLMOGOROV STUDENTS' CONTEST IN PROBABILITY THEORY"
    Piterbarg, V.
    Shiryaev, A. N.
    Abakirova, A. T.
    Erlikh, I. G.
    Kulikov, A. V.
    Musin, M. M.
    Prohorenkov, S. P.
    Shashkin, A. P.
    Sukhanova, E. M.
    Yartseva, D. A.
    Yas'kov, P. A.
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2010, 54 (02) : 371 - 373
  • [33] Tsallis Entropy, Escort Probability and the Incomplete Information Theory
    Darooneh, Amir Hossein
    Naeimi, Ghassem
    Mehri, Ali
    Sadeghi, Parvin
    ENTROPY, 2010, 12 (12) : 2497 - 2503
  • [34] HOW TO BASE PROBABILITY THEORY ON PERFECT-INFORMATION
    Gurevich, Yuri
    Shafer, Glenn
    Vovk, Vladimir
    Chychyla, Roman
    BULLETIN OF THE EUROPEAN ASSOCIATION FOR THEORETICAL COMPUTER SCIENCE, 2010, (100): : 115 - 148
  • [35] Information on the third "Kolmogorov Students' Competition on Probability Theory"
    Shiryaev, AN
    Cherny, AS
    Dilman, SV
    Medvedev, IN
    Mishchenko, AS
    Selivanov, AV
    Urusov, MA
    THEORY OF PROBABILITY AND ITS APPLICATIONS, 2004, 49 (03) : 557 - 559
  • [36] LINEAR THEORY OF FISCHER INFORMATION AND METHOD FOR MAXIMUM PROBABILITY
    GERLEIN, OV
    KAGAN, AM
    TEORIYA VEROYATNOSTEI I YEYE PRIMENIYA, 1975, 20 (03): : 690 - 691
  • [37] Generalized information theory: Emerging crossroads of fuzziness and probability
    Klir, GJ
    NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society, 2005, : 5 - 6
  • [38] Moderating probability distributions for unrepresented uncertainty: Application to sentiment analysis via deep learning
    Bickel, David R.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (19) : 6559 - 6572
  • [39] Estimating the probability of recontamination via the air using Monte Carlo simulations
    den Aantrekker, ED
    Beumer, RR
    van Gerwen, SJC
    Zwietering, MH
    van Schothorst, M
    Boom, RM
    INTERNATIONAL JOURNAL OF FOOD MICROBIOLOGY, 2003, 87 (1-2) : 1 - 15
  • [40] Estimating parameters for discrete distributions via the empirical probability generating function
    Dowling, MM
    Nakamura, M
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 1997, 26 (01) : 301 - 313