A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach

被引:90
|
作者
Ghiassi, M. [1 ]
Lee, S. [2 ]
机构
[1] Santa Clara Univ, Santa Clara, CA 95053 USA
[2] Stella Technol, San Jose, CA 95119 USA
关键词
Twitter sentiment analysis; Domain transferability; n-gram analysis; Machine learning; Dynamic artificial neural networks (DAN2); CLASSIFICATION;
D O I
10.1016/j.eswa.2018.04.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Twitter messaging service has become a platform for customers and news consumers to express sentiments. Accurately capturing these sentiments has been challenging for researchers. The traditional approaches to Twitter Sentiment Analysis (TSA) include dictionary-based and use supervised machine learning tools for sentiment classification. This research follows the supervised machine learning approach. A major challenge for the machine learning approach is feature selection, which is often domain dependent. We address this specific challenge and present a novel approach to identify a lexicon set unique to TSA. We show that this Twitter Specific Lexicon Set(TSLS) is small, and most importantly, is domain transferable. This identification process generates a collection of vectorized tweets for input to machine learning tools. In traditional approaches, this vectorization often results in a highly sparse input matrix which produces low accuracy measures. In this research, we hierarchically reduce the feature set to a small set of seven "meta features" to reduce sparsity. We show that TSA based on these features can produce highly accurate results using a dynamic architecture for neural networks (DAN2) and SVM (machine learning tools) as measured by recall, precision, and F-1 metrics (the harmonic average of precision and recall). Our results show that a Twitter Generic Feature Set (TGFS) derived from two datasets (@JustinBieber and@Starbucks) is domain transferable and when combined with only a few Twitter Domain Specific Features (TDSF) (less than 3%), can produce excellent sentiment classification values. We evaluate the effectiveness and transferability of the TGFS across three new and distinct domains (@GovChristie, @SouthwestAir, and @VerizonWireless). (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:197 / 216
页数:20
相关论文
共 50 条
  • [1] Sentiment analysis using Twitter data: a comparative application of lexicon- and machine-learning-based approach
    Yuxing Qi
    Zahratu Shabrina
    [J]. Social Network Analysis and Mining, 13
  • [2] Sentiment analysis using Twitter data: a comparative application of lexicon- and machine-learning-based approach
    Qi, Yuxing
    Shabrina, Zahratu
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2023, 13 (01)
  • [3] Urdu Sentiment Analysis Using Supervised Machine Learning Approach
    Mukhtar, Neelam
    Khan, Mohammad Abid
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (02)
  • [4] Slangs and Short forms of Malay Twitter Sentiment Analysis using Supervised Machine Learning
    Yin, Cheng Jet
    Ayop, Zakiah
    Anawar, Syarulnaziah
    Othman, Nur Fadzilah
    Zainudin, Norulzahrah Mohd
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (11): : 294 - 300
  • [5] Comparative Analysis of Lexicon and Machine Learning Approach for Sentiment Analysis
    Srivastava, Roopam
    Bharti, P. K.
    Verma, Parul
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (03) : 71 - 77
  • [6] A lexicon weighted sentiment analysis approach on Twitter
    Shayegan, Mohammad Javad
    Molanorouzi, Mehrdad
    [J]. International Journal of Web Based Communities, 2021, 17 (03) : 149 - 162
  • [7] Evaluating Cross Domain Sentiment Analysis using Supervised Machine Learning Techniques
    Aziz, Azwa Abdul
    Starkey, Andrew
    Bannerman, Marcus Campbell
    [J]. PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 689 - 696
  • [8] Lexicon-based approach outperforms Supervised Machine Learning approach for Urdu Sentiment Analysis in multiple domains
    Mukhtar, Neelam
    Khan, Mohammad Abid
    Chiragh, Nadia
    [J]. TELEMATICS AND INFORMATICS, 2018, 35 (08) : 2173 - 2183
  • [9] An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning
    Ghosh M.
    Sanyal G.
    [J]. Ghosh, Monalisa (monalisa_05mca@yahoo.com), 2018, SpringerOpen (05)
  • [10] Twitter Sentiment Analysis Using Machine Learning Techniques
    Le, Bac
    Huy Nguyen
    [J]. ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 279 - 289