STREAM TEXT DATA ANALYSIS ON TWITTER USING APACHE SPARK STREAMING

被引:0
|
作者
Hakdagli, Ozlem [1 ]
Ozcan, Caner [2 ]
Ogul, Iskender Ulgen [3 ]
机构
[1] Karabuk Univ, Bilgisayar Muhendisligi, Karabuk, Turkey
[2] Purdue Univ, Elekt & Bilgisayar Muhendisligi, W Lafayette, IN 47907 USA
[3] Izmir Yuksek Teknol Enstitusu, Bilgisayar Muhendisligi, Izmir, Turkey
关键词
Apache Spark; Spark Streaming; Twitter; Machine Learning; Text Mining;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With today's developing technology, people's access to information and its production have reached a very fast level. These generated and obtained information are instantly created, entered into data systems and updated. Sources of streaming data can be transformed into valuable analysis results when they are handled with targeted methods. In this study, a text data field is determined to perform analysis on instantaneous generated data and Twitter, the richest platform for instant text data, is used. Twitter instantly generates a variety of data in large quantities and it presents it as open source using an API. A machine learning framework Apache Spark's stream analysis environment is used to analyze these resources. Situation analysis was performed using Support Vector Machine, Decision Trees and Logistic Regression algorithms presented under this environment. The results are presented in tables.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning
    Alotaibi, Shoayee
    Mehmood, Rashid
    Katib, Iyad
    Rana, Omer
    Albeshri, Aiiad
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [32] Spark-Tuner: An Elastic Auto-Tuner for Apache Spark Streaming
    HoseinyFarahabady, M. Reza
    Taheri, Javid
    Zomaya, Albert Y.
    Tari, Zahir
    [J]. 2020 IEEE 13TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD 2020), 2020, : 544 - 548
  • [33] Apache Spark and Apache Ignite Performance Analysis
    Stan, Cristiana-Stefania
    Pandelica, Adrian-Eduard
    Zamfir, Vlad-Andrei
    Stan, Roxana Gabriela
    Negru, Catalin
    [J]. 2019 22ND INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE (CSCS), 2019, : 726 - 733
  • [34] Real-Time Heart Arrhythmia Detection Using Apache Spark Structured Streaming
    Ilbeigipour, Sadegh
    Albadvi, Amir
    Akhondzadeh Noughabi, Elham
    [J]. JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021
  • [35] SENTIMENT ANALYSIS ON TWITTER USING STREAMING API
    Trupthi, M.
    Pabboju, Suresh
    Narasimha, G.
    [J]. 2017 7TH IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2017, : 915 - 919
  • [36] Design and Development of A Cloud-Based IDS using Apache Kafka and Spark Streaming
    Wirz, Leon
    Tanthanathewin, Rinrada
    Ketphet, Asipan
    Fugkeaw, Somchart
    [J]. 2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,
  • [37] Hot Topic Detection Using Twitter Streaming Data
    Jagic, Teodor
    Brkic, Ljiljana
    [J]. 2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 1730 - 1735
  • [38] Streaming Massive Electric Power Data Analysis Based on Spark Streaming
    Zhang, Xudong
    Qian, Zhongwen
    Shen, Siqi
    Shi, Jia
    Wang, Shujun
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2019, 11448 : 200 - 212
  • [39] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
    Fernandez-Gomez, Antonio M.
    Gutierrez-Aviles, David
    Troncoso, Alicia
    Martinez-Alvarez, Francisco
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11078 - 11100
  • [40] A new Apache Spark-based framework for big data streaming forecasting in IoT networks
    Antonio M. Fernández-Gómez
    David Gutiérrez-Avilés
    Alicia Troncoso
    Francisco Martínez-Álvarez
    [J]. The Journal of Supercomputing, 2023, 79 : 11078 - 11100