Sentiment Analysis using Unlabeled Email data

被引:0
|
作者
Ali, Rayan Salah Hag [1 ]
El Gayar, Neamat [1 ]
机构
[1] Heriot Watt Univ, Sch Math & Comp Sci, Dubai, U Arab Emirates
关键词
Sentiment analysis; k-means; TFIDF; support vector machine;
D O I
10.1109/iccike47802.2019.9004372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment Analysis (SA) in the context of text mining is an automated process to detect subjectivity information, such as opinions, attitudes, emotions and feeling. Most prior work in SA view it as a text classification problem which needs labeled data to train the model. However, it is tough to get a labeled dataset. Most of the times we will need to do it by hand. Another issue is that the lack of portability across different domains makes it hard to use the same labeled data in different applications. Thus, we need to create labeled data for each domain manually. In this paper, we will use sentiment analysis to analyze the Enron email dataset. This work aims to find the best techniques to label the dataset automatically and avoid manual labeling. The training data is used to build a classifier using a supervised machine learning algorithm. In the labeling phase, we compare the lexicon labeling with k- mean labeling. Lexicon labeling gave better and reliable results. We used this labeled dataset to train the classifier. We used TF-IDF for feature extraction, to train Naive Bayes and Support vector machine (SVM) classifiers.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 50 条
  • [31] Analysis of Learning from Positive and Unlabeled Data
    du Plessis, Marthinus C.
    Niu, Gang
    Sugiyama, Masashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [32] Regularized canonical correlation analysis with unlabeled data
    Zhou, Xi-chuan
    Shen, Hai-bin
    JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE A, 2009, 10 (04): : 504 - 511
  • [33] Regularized canonical correlation analysis with unlabeled data
    Xi-chuan Zhou
    Hai-bin Shen
    Journal of Zhejiang University-SCIENCE A, 2009, 10 : 504 - 511
  • [34] Universum Prescription: Regularization Using Unlabeled Data
    Zhang, Xiang
    LeCun, Yann
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2907 - 2913
  • [36] Saudi Stock Market Sentiment Analysis using Twitter Data
    Alazba, Amal
    Alturayeif, Nora
    Alturaief, Nouf
    Alhathloul, Zainab
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 36 - 47
  • [37] Generalization error bounds using unlabeled data
    Kääriäinen, M
    LEARNING THEORY, PROCEEDINGS, 2005, 3559 : 127 - 142
  • [38] Sentiment Analysis on Twitter Data using Apache Spark Framework
    Elzayady, Hossam
    Badran, Khaled M.
    Salama, Gouda I.
    PROCEEDINGS OF 2018 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND SYSTEMS (ICCES), 2018, : 171 - 176
  • [39] Election Prediction Based on Sentiment Analysis using Twitter Data
    Yavari A.
    Hassanpour H.
    Cami B.R.
    Mahdavi M.
    International Journal of Engineering, Transactions A: Basics, 2022, 35 (02): : 372 - 379
  • [40] Action Rules for Sentiment Analysis on Twitter Data using Spark
    Ranganathan, Jaishree
    Irudayaraj, Allen S.
    Tzacheva, Angelina A.
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 51 - 60