A spark-based big data analysis framework for real-time sentiment prediction on streaming data

被引:8
|
作者
Kilinc, Deniz [1 ]
机构
[1] Manisa Celal Bayar Univ, Fac Technol, Dept Software Engn, TR-45400 Manisa, Turkey
来源
SOFTWARE-PRACTICE & EXPERIENCE | 2019年 / 49卷 / 09期
关键词
Big Data machine learning; fake account detection; real-time sentiment analysis; streaming data; Twitter streaming;
D O I
10.1002/spe.2724
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There are many data sources that produce large volumes of data. The Big Data nature requires new distributed processing approaches to extract the valuable information. Real-time sentiment analysis is one of the most demanding research areas that requires powerful Big Data analytics tools such as Spark. Prior literature survey work has shown that, though there are many conventional sentiment analysis researches, there are only few works realizing sentiment analysis in real time. One major point that affects the quality of real-time sentiment analysis is the confidence of the generated data. In more clear terms, it is a valuable research question to determine whether the owner that generates sentiment is genuine or not. Since data generated by fake personalities may decrease accuracy of the outcome, a smart/intelligent service that can identify the source of data is one of the key points in the analysis. In this context, we include a fake account detection service to the proposed framework. Both sentiment analysis and fake account detection systems are trained and tested using Naive Bayes model from Apache Spark's machine learning library. The developed system consists of four integrated software components, ie, (i) machine learning and streaming service for sentiment prediction, (ii) a Twitter streaming service to retrieve tweets, (iii) a Twitter fake account detection service to assess the owner of the retrieved tweet, and (iv) a real-time reporting and dashboard component to visualize the results of sentiment analysis. The sentiment classification performances of the system for offline and real-time modes are 86.77% and 80.93%, respectively.
引用
收藏
页码:1352 / 1364
页数:13
相关论文
共 50 条
  • [1] A Scalable Streaming Big Data Architecture for Real-Time Sentiment Analysis
    Ayvaz, Serkan
    Shiha, Mohammed O.
    [J]. PROCEEDINGS OF 2018 2ND INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2018), 2018, : 47 - 51
  • [2] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
    Fernandez-Gomez, Antonio M.
    Gutierrez-Aviles, David
    Troncoso, Alicia
    Martinez-Alvarez, Francisco
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11078 - 11100
  • [3] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
    Antonio M. Fernández-Gómez
    David Gutiérrez-Avilés
    Alicia Troncoso
    Francisco Martínez-Álvarez
    [J]. The Journal of Supercomputing, 2023, 79 : 11078 - 11100
  • [4] Research on High-Performance Real-time Data Analysis System Based on Spark Streaming in Big Data Environment
    Wang, Jialin
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 140 - 141
  • [5] Design and Implementation of Real-Time Video Big Data Platform based on Spark Streaming
    Chen, Hongjun
    Luo, Fuqiang
    Zhao, Liheng
    Li, Yao
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE), 2017, 190 : 733 - 739
  • [6] Real-Time Data ETL Framework for Big Real-Time Data Analysis
    Li, Xiaofang
    Mao, Yingchi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1289 - 1294
  • [7] A Dynamic Spark-based Classification Framework for Imbalanced Big Data
    Nahla B. Abdel-Hamid
    Sally ElGhamrawy
    Ali El Desouky
    Hesham Arafat
    [J]. Journal of Grid Computing, 2018, 16 : 607 - 626
  • [8] A Dynamic Spark-based Classification Framework for Imbalanced Big Data
    Abdel-Hamid, Nahla B.
    ElGhamrawy, Sally
    El Desouky, Ali
    Arafat, Hesham
    [J]. JOURNAL OF GRID COMPUTING, 2018, 16 (04) : 607 - 626
  • [9] Efficient Spark-Based Framework for Big Geospatial Data Query Processing and Analysis
    Aljawarneh, Isam Mashhour
    Bellavista, Paolo
    Corradi, Antonio
    Montanari, Rebecca
    Foschini, Luca
    Zanotti, Andrea
    [J]. 2017 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2017, : 851 - 856
  • [10] A Framework for Real-time Sentiment Analysis of Big Data Generated by Social Media Platforms
    Fahd, Kiran
    Parvin, Sazia
    de Souza-Daw, Anthony
    [J]. 2021 31ST INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2021, : 30 - 33