HiSAT: Hierarchical Framework for Sentiment Analysis on Twitter Data

被引:1
|
作者
Kommu, Amrutha [1 ]
Patel, Snehal [1 ]
Derosa, Sebastian [1 ]
Wang, Jiayin [1 ]
Varde, Aparna S. [1 ]
机构
[1] Montclair State Univ, Montclair, NJ 07043 USA
基金
美国国家科学基金会;
关键词
Bayesian models; Knowledge discovery; Logistic Regression; NLP; Opinion mining; Random Forest; Social media; Text mining; EMOTION RECOGNITION FEATURES;
D O I
10.1007/978-3-031-16072-1_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media websites such as Twitter have become so indispensable today that people use them almost on a daily basis for sharing their emotions, opinions, suggestions and thoughts. Motivated by such behavioral tendencies, the purpose of this study is to define an approach to automatically classify the tweets on Twitter data into two main classes, namely, hate speech and non-hate speech. This provides a valuable source of information in analyzing and understanding target audiences and spotting marketing trends. We thus propose HiSAT, a Hierarchical framework for Sentiment Analysis on Twitter data. Sentiments/opinions in tweets are highly unstructured-and do not have a proper defined sequence. They constitute a heterogeneous data from many sources having different formats, and express either positive or negative, or neutral sentiment. Hence, in HiSAT we conduct Natural Language Processing encompassing tokenization, stemming and lemmatization techniques that convert text to tokens; as well as Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) techniques that convert text sentences into numeric vectors. These are then fed as inputs to Machine learning algorithms within the HiSAT framework; more specifically, Random Forest, Logistic Regression and Naive Bayes are used as text-binary classifiers to detect hate speech and non-hate speech from the tweets. Results of experiments performed with the HiSAT framework show that Random Forest outperforms the others with a better prediction in estimating the correct labels (with accuracy above the 95% range). We present the HiSAT approach, its implementation and experiments, along with related work and ongoing research.
引用
收藏
页码:376 / 392
页数:17
相关论文
共 50 条
  • [41] Sentiment Analysis On Twitter Data Using Distributed Architecture
    Karhan, Zebra
    Soysaldi, Meryem
    Ozben, Yagiz Ozgenc
    Kilic, Erdal
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 357 - 360
  • [42] Techniques for Sentiment Analysis of Twitter Data: A Comprehensive Survey
    Desai, Mitali
    Mehta, Mayuri A.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2016, : 149 - 154
  • [43] A Hybrid Approach for the Sentiment Analysis of Turkish Twitter Data
    Shehu, H. A.
    Tokat, S.
    [J]. ARTIFICIAL INTELLIGENCE AND APPLIED MATHEMATICS IN ENGINEERING PROBLEMS, 2020, 43 : 182 - 190
  • [44] Sentiment Analysis on Twitter
    Meral, Meric
    Diri, Banu
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 690 - 693
  • [45] Twitter Sentiment Analysis
    Sarlan, Aliza
    Nadam, Chayanit
    Basri, Shuib
    [J]. PROCEEDINGS OF THE 2014 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MULTIMEDIA (ICIM), 2014, : 212 - 216
  • [46] Sentiment analysis on twitter
    Department of Computer Engineering, Delhi Technological University Delhi, India
    [J]. Int. J. Comput. Sci. Issues, 2012, 4 4-3 (372-378):
  • [47] Sentiment analysis with Twitter
    Akgul, Eyup Sercan
    Ertano, Caner
    Diri, Banu
    [J]. PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2016, 22 (02): : 106 - 110
  • [48] Sentiment analysis in Twitter
    Martinez-Camara, Eugenio
    Teresa Martin-Valdivia, M.
    Alfonso Urena-Lopez, L.
    Montejo-Raez, Arturo
    [J]. NATURAL LANGUAGE ENGINEERING, 2014, 20 (01) : 1 - 28
  • [49] TwiFly: A Data Analysis Framework for Twitter
    Chatziadam, Panagiotis
    Dimitriadis, Aftantil
    Gikas, Stefanos
    Logothetis, Ilias
    Michalodimitrakis, Manolis
    Neratzoulakis, Manolis
    Papadakis, Alexandros
    Kontoulis, Vasileios
    Siganos, Nikolaos
    Theodoropoulos, Dimitrios
    Vougioukalos, Giannis
    Hatzakis, Ilias
    Gerakis, George
    Papadakis, Nikolaos
    Kondylakis, Haridimos
    [J]. INFORMATION, 2020, 11 (05)
  • [50] Global Perspective on EMR and eHealth: Sentiment Analysis of Twitter Data Incorporating a Socio-Technical Framework
    Parthasarathy, Rangarajan
    Rangarajan, Anuradha
    Garfield, Monica
    Bingi, Prasad
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2024, 20 (01)