Automated Detection of Vaping-Related Tweets on Twitter During the 2019 EVALI Outbreak Using Machine Learning Classification

被引:2
|
作者
Ren, Yang [1 ]
Wu, Dezhi [2 ]
Singh, Avineet [1 ]
Kasson, Erin [3 ]
Huang, Ming [4 ]
Cavazos-Rehg, Patricia [3 ]
机构
[1] Univ South Carolina, Dept Comp Sci & Engn, Columbia, SC 29208 USA
[2] Univ South Carolina, Dept Integrated Informat Technol, Columbia, SC 29208 USA
[3] Washington Univ, Sch Med, Dept Psychiat, St, St Louis, MO 63110 USA
[4] Mayo Clin, Dept Artificial Intelligence & Informat, Rochester, MN 55905 USA
来源
FRONTIERS IN BIG DATA | 2022年 / 5卷
基金
美国国家卫生研究院;
关键词
vaping; e-cigarette; Twitter; machine learning; deep learning; classification; detection; EVALI;
D O I
10.3389/fdata.2022.770585
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There are increasingly strict regulations surrounding the purchase and use of combustible tobacco products (i.e., cigarettes); simultaneously, the use of other tobacco products, including e-cigarettes (i.e., vaping products), has dramatically increased. However, public attitudes toward vaping vary widely, and the health effects of vaping are still largely unknown. As a popular social media, Twitter contains rich information shared by users about their behaviors and experiences, including opinions on vaping. It is very challenging to identify vaping-related tweets to source useful information manually. In the current study, we proposed to develop a detection model to accurately identify vaping-related tweets using machine learning and deep learning methods. Specifically, we applied seven popular machine learning and deep learning algorithms, including Naive Bayes, Support Vector Machine, Random Forest, XGBoost, Multilayer Perception, Transformer Neural Network, and stacking and voting ensemble models to build our customized classification model. We extracted a set of sample tweets during an outbreak of e-cigarette or vaping-related lung injury (EVALI) in 2019 and created an annotated corpus to train and evaluate these models. After comparing the performance of each model, we found that the stacking ensemble learning achieved the highest performance with an F1-score of 0.97. All models could achieve 0.90 or higher after tuning hyperparameters. The ensemble learning model has the best average performance. Our study findings provide informative guidelines and practical implications for the automated detection of themed social media data for public opinions and health surveillance purposes.
引用
收藏
页数:14
相关论文
共 28 条
  • [1] Using a mixed methods approach to identify public perception of vaping risks and overall health outcomes on Twitter during the 2019 EVALI outbreak
    Kasson, Erin
    Singh, Avineet Kumar
    Huang, Ming
    Wu, Dezhi
    Cavazos-Rehg, Patricia
    [J]. INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2021, 155
  • [2] Classification of Tweets Related to Natural Disasters Using Machine Learning Algorithms
    Iparraguirre-Villanueva, Orlando
    Melgarejo-Graciano, Melquiades
    Castro-Leon, Gloria
    Olaya-Cotera, Sandro
    Ruiz-Alvarado, John
    Epifanía-Huerta, Andrés
    Cabanillas-Carbonell, Michael
    Zapata-Paulini, Joselyn
    [J]. International Journal of Interactive Mobile Technologies, 2023, 17 (14) : 144 - 162
  • [3] Bullying discourse on Twitter: An examination of bully-related tweets using supervised machine learning
    Sainju, Karla Dhungana
    Mishra, Niti
    Kuffour, Akosua
    Young, Lisa
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2021, 120
  • [4] Automatic classification of Aurora-related tweets using machine learning methods
    Christodoulou, Vyron
    Filgueira, Rosa
    Bee, Emma
    MacDonald, Elizabeth
    Kosar, Burcu
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON GEOINFORMATICS AND DATA ANALYSIS (ICGDA 2019), 2019, : 115 - 119
  • [5] Relevancy Assessment of Tweets using Supervised Learning Techniques Mining emergency related Tweets for automated relevancy classification
    Habdank, Matthias
    Rodehutskors, Nikolai
    Koch, Rainer
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES FOR DISASTER MANAGEMENT (ICT-DM), 2017,
  • [6] Sentiment analysis using various machine learning algorithms for disaster related tweets classification
    Sudha, S. Baby
    Dhanalakshmi, S.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2023, 11 (04) : 390 - 417
  • [7] COVID-19 Tweets Classification during Lockdown Period Using Machine Learning Classifiers
    Jafar Zaidi, Syed Ali
    Chatterjee, Indranath
    Brahim Belhaouari, Samir
    [J]. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2022, 2022
  • [8] Automated defect detection and classification in ashlar masonry walls using machine learning
    Valero, Enrique
    Forster, Alan
    Bosche, Frederic
    Hyslop, Ewan
    Wilson, Lyn
    Turmel, Aurelie
    [J]. AUTOMATION IN CONSTRUCTION, 2019, 106
  • [9] Fake News Detection of South African COVID-19 Related Tweets using Machine Learning
    Khan, Yaseen
    Thakur, Surendra
    [J]. 5TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS (ICABCD2022), 2022,
  • [10] Lithuanian river ice detection and automated classification using machine-learning methods
    Bevainis, Linas
    Bielinis, Martynas
    Cesnulevicius, Agimantas
    Bautrenas, Arturas
    [J]. BALTICA, 2023, 36 (01): : 1 - 12