Sentiment Analysis using Unlabeled Email data

被引:0
|
作者
Ali, Rayan Salah Hag [1 ]
El Gayar, Neamat [1 ]
机构
[1] Heriot Watt Univ, Sch Math & Comp Sci, Dubai, U Arab Emirates
关键词
Sentiment analysis; k-means; TFIDF; support vector machine;
D O I
10.1109/iccike47802.2019.9004372
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment Analysis (SA) in the context of text mining is an automated process to detect subjectivity information, such as opinions, attitudes, emotions and feeling. Most prior work in SA view it as a text classification problem which needs labeled data to train the model. However, it is tough to get a labeled dataset. Most of the times we will need to do it by hand. Another issue is that the lack of portability across different domains makes it hard to use the same labeled data in different applications. Thus, we need to create labeled data for each domain manually. In this paper, we will use sentiment analysis to analyze the Enron email dataset. This work aims to find the best techniques to label the dataset automatically and avoid manual labeling. The training data is used to build a classifier using a supervised machine learning algorithm. In the labeling phase, we compare the lexicon labeling with k- mean labeling. Lexicon labeling gave better and reliable results. We used this labeled dataset to train the classifier. We used TF-IDF for feature extraction, to train Naive Bayes and Support vector machine (SVM) classifiers.
引用
收藏
页码:329 / 334
页数:6
相关论文
共 50 条
  • [11] Automated genre-based multi-domain sentiment lexicon adaptation using unlabeled data
    Sanagar, Swati
    Gupta, Deepa
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 6223 - 6234
  • [12] Discovering sentiment sequence within email data through trajectory representation
    Liu, Sisi
    Lee, Ickjai
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 99 : 1 - 11
  • [13] Sentiment analysis using product review data
    Fang X.
    Zhan J.
    Journal of Big Data, 2015, 2 (01)
  • [14] Sociology Study Using Email Data and Social Network Analysis
    Rafiq, Wajid
    Khan, Shoab Ahmed
    Sohail, Muhammad
    INFORMATION TECHNOLOGY: NEW GENERATIONS, 2016, 448 : 1053 - 1061
  • [15] SSentiA: A Self-supervised Sentiment Analyzer for classification from unlabeled data
    Sazzed, Salim
    Jayarathna, Sampath
    MACHINE LEARNING WITH APPLICATIONS, 2021, 4
  • [16] Using unlabeled data for supervised learning
    Towell, G
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 647 - 653
  • [17] Spam email classification and sentiment analysis based on semantic similarity methods
    Srinivasarao, Ulligaddala
    Sharaff, Aakanksha
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2023, 26 (01) : 65 - 77
  • [18] Email thread sentiment sequence identification using PLSA clustering algorithm
    Srinivasarao, Ulligaddala
    Sharaff, Aakanksha
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
  • [19] Sentiment Analysis on Automobile Brands Using Twitter Data
    Asghar, Zain
    Ali, Tahir
    Ahmad, Imran
    Tharanidharan, Sridevi
    Nazar, Shamim Kamal Abdul
    Kamal, Shahid
    INTELLIGENT TECHNOLOGIES AND APPLICATIONS, INTAP 2018, 2019, 932 : 76 - 85
  • [20] Sentiment Analysis Framework Using Data Driven Approach
    Islam, Md Jahedul
    Shuvo, Md Shubiour
    Sarker, Tonmoy
    Parvez, Mohammad Zavid
    Rahman, Md Anisur
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 143 - 150