Real-Time Social Media Analytics with Deep Transformer Language Models: A Big Data Approach

被引:3
|
作者
Ahmet, Ahmed [1 ]
Abdullah, Tariq [1 ]
机构
[1] Univ Derby, Dept Comp Sci, Derby, England
关键词
Real-time analytics; Social media; deep learning; machine learning; transfer learning; big data;
D O I
10.1109/BigDataSE50710.2020.00014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Utilisation of transfer learning with deep language models is regarded as one of the most important developments in deep learning. Their application on real-time high-velocity and volume user-generated data has been elusive due to the unprecedented size and complexity of the models which result in substantial computational overhead. Recent iterations of these architectures have produced significantly distilled models with state-of-the-art performance and reduced resource requirement. We utilize deep transformer language models on user-generated data alongside a robust text normalization pipeline to address what is considered as the Achilles heel of deep learning on user-generated text data, namely data normalization. In this paper, we propose a framework for the ingestion, analysis and storage of real-time data streams. A case study in sentiment analysis and offensive/hateful language detection is used to evaluate the framework. We demonstrate inference on a large Twitter dataset using CPU and GPU clusters, highlighting the viability of the fine-tuned distilled language model for high volume data. Fine-tuned model significantly outperforms previous state-of-the-art on several benchmark datasets, providing a powerful model that can be utilized for a variety of downstream tasks. To our knowledge, this is the only study demonstrating powerful transformer language models for real-time social media stream analytics in a distributed setting.
引用
收藏
页码:41 / 48
页数:8
相关论文
共 50 条
  • [1] An incremental approach for real-time Big Data visual analytics
    Garcia, Ignacio
    Casado, Ruben
    Bouchachia, Abdelhamid
    2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 177 - 182
  • [2] Big Data Analytics of Geosocial Media for Planning and Real-Time Decisions
    Rathore, M. Mazhar
    Paul, Anand
    Ahmad, Awais
    Imran, Muhammad
    Guizani, Mohsen
    2017 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2017,
  • [3] Big data analytics on social networks for real-time depression detection
    Angskun, Jitimon
    Tipprasert, Suda
    Angskun, Thara
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [4] Big data analytics on social networks for real-time depression detection
    Jitimon Angskun
    Suda Tipprasert
    Thara Angskun
    Journal of Big Data, 9
  • [5] Real-Time Big Data Analytics: Applications and Challenges
    Mohamed, Nader
    Al-Jaroodi, Jameela
    2014 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2014, : 305 - 310
  • [6] Improved Big Data Analytics Solution Using Deep Learning Model and Real-Time Sentiment Data Analysis Approach
    Chen, Chun-I Philip
    Zheng, Jiangbin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2018, 2018, 10989 : 579 - 588
  • [7] A Streamlined Approach for Real-Time Data Analytics
    Arora, Shruti
    Rani, Rinkle
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 732 - 736
  • [8] A Methodology of Real-Time Data Fusion for Localized Big Data Analytics
    Jabbar, Sohail
    Malik, Kaleem R.
    Ahmad, Mudassar
    Aldabbas, Omar
    Asif, Muhammad
    Khalid, Shehzad
    Han, Kijun
    Ahmed, Syed Hassan
    IEEE ACCESS, 2018, 6 : 24510 - 24520
  • [9] MOLESTRA: A Multi-Task Learning Approach for Real-Time Big Data Analytics
    Demertzis, Konstantinos
    Iliadis, Lazaros
    Anezakis, Vardis-Dimitris
    2018 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2018,
  • [10] Logical big data integration and near real-time data analytics
    Silva, Bruno
    Moreira, Jose
    Costa, Rogerio Luis de C.
    DATA & KNOWLEDGE ENGINEERING, 2023, 146