Real-Time Tweet Analytics Using Hybrid Hashtags on Twitter Big Data Streams

被引:9
|
作者
Gupta, Vibhuti [1 ]
Hewett, Rattikorn [1 ]
机构
[1] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79415 USA
关键词
Twitter; Hybrid Hashtags; Big Data stream; ontology; Apache Storm; SENTIMENT ANALYSIS;
D O I
10.3390/info11070341
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter is a microblogging platform that generates large volumes of data with high velocity. This daily generation of unbounded and continuous data leads to Big Data streams that often require real-time distributed and fully automated processing. Hashtags, hyperlinked words in tweets, are widely used for tweet topic classification, retrieval, and clustering. Hashtags are used widely for analyzing tweet sentiments where emotions can be classified without contexts. However, regardless of the wide usage of hashtags, general tweet topic classification using hashtags is challenging due to its evolving nature, lack of context, slang, abbreviations, and non-standardized expression by users. Most existing approaches, which utilize hashtags for tweet topic classification, focus on extracting hashtag concepts from external lexicon resources to derive semantics. However, due to the rapid evolution and non-standardized expression of hashtags, the majority of these lexicon resources either suffer from the lack of hashtag words in their knowledge bases or use multiple resources at once to derive semantics, which make them unscalable. Along with scalable and automated techniques for tweet topic classification using hashtags, there is also a requirement for real-time analytics approaches to handle huge and dynamic flows of textual streams generated by Twitter. To address these problems, this paper first presents a novel semi-automated technique that derives semantically relevant hashtags using a domain-specific knowledge base of topic concepts and combines them with the existing tweet-based-hashtags to produce Hybrid Hashtags. Further, to deal with the speed and volume of Big Data streams of tweets, we present an online approach that updates the preprocessing and learning model incrementally in a real-time streaming environment using the distributed framework, Apache Storm. Finally, to fully exploit the batch and stream environment performance advantages, we propose a comprehensive framework (Hybrid Hashtag-based Tweet topic classification (HHTC) framework) that combines batch and online mechanisms in the most effective way. Extensive experimental evaluations on a large volume of Twitter data show that the batch and online mechanisms, along with their combination in the proposed framework, are scalable, efficient, and provide effective tweet topic classification using hashtags.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Real-time tweet analytics using hybrid hashtags on twitter big data streams
    Gupta, Vibhuti
    Hewett, Rattikorn
    [J]. Information (Switzerland), 2020, 11 (07):
  • [2] Using a Rich Context Model for Real-Time Big Data Analytics in Twitter
    Sotsenko, Alisa
    Jansen, Marc
    Milrad, Marcelo
    Rana, Juwel
    [J]. 2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 228 - 233
  • [3] Real-Time Twitter Trend Analysis Using Big Data Analytics and Machine Learning Techniques
    Rodrigues, Anisha P.
    Fernandes, Roshan
    Bhandary, Adarsh
    Shenoy, Asha C.
    Shetty, Ashwanth
    Anisha, M.
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [4] Real-Time Predicting Bursting Hashtags on Twitter
    Kong, Shoubin
    Mei, Qiaozhu
    Feng, Ling
    Zhao, Zhe
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 268 - 271
  • [5] Mapping the Big Data Landscape: Technologies, Platforms and Paradigms for Real-Time Analytics of Data Streams
    Dubuc, Timothee
    Stahl, Frederic
    Roesch, Etienne B.
    [J]. IEEE ACCESS, 2021, 9 : 15351 - 15374
  • [6] Developing a Real-time Data Analytics Framework For Twitter Streaming Data
    Yadranjiaghdam, Babak
    Yasrobi, Seyedfaraz
    Tabrizi, Nasseh
    [J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 329 - 336
  • [7] Real-Time Big Data Analytics: Applications and Challenges
    Mohamed, Nader
    Al-Jaroodi, Jameela
    [J]. 2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 305 - 310
  • [8] Using Big Data and Real-Time Analytics to Support Smart City Initiatives
    Souza, Arthur
    Figueredo, Mickael
    Cacho, Nelio
    Araujo, Daniel
    Prolo, Carlos A.
    [J]. IFAC PAPERSONLINE, 2016, 49 (30): : 257 - 262
  • [9] Towards Real-Time Road Traffiic Analytics using Telco Big Data
    Costa, Constantinos
    Chatzimilioudis, Georgios
    Zeinalipour-Yazti, Demetrios
    Mokbel, Mohamed F.
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL WORKSHOP ON REAL-TIME BUSINESS INTELLIGENCE AND ANALYTICS, 2017,
  • [10] A Methodology of Real-Time Data Fusion for Localized Big Data Analytics
    Jabbar, Sohail
    Malik, Kaleem R.
    Ahmad, Mudassar
    Aldabbas, Omar
    Asif, Muhammad
    Khalid, Shehzad
    Han, Kijun
    Ahmed, Syed Hassan
    [J]. IEEE ACCESS, 2018, 6 : 24510 - 24520