Identifying health related occupations of Twitter users through word embedding and deep neural networks

被引:3
|
作者
Zainab, Kazi [1 ]
Sriyastava, Gautam [1 ,2 ,3 ]
Mago, Vijay [1 ]
机构
[1] Lakehead Univ, Dept Comp Sci, Oliver Rd, Thunder Bay, ON, Canada
[2] Brandon Univ, Dept Math & Comp Sci, 270 18th St, Brandon, MB R7A 6A9, Canada
[3] China Med Univ, Res Ctr Interneural Comp, 91 Xueshi Rd, Taichung 40402, Taiwan
基金
加拿大自然科学与工程研究理事会;
关键词
Deep learning; Natural language processing; Text classification; Medical data; Twitter;
D O I
10.1186/s12859-022-04933-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Twitter is a popular social networking site where short messages or "tweets" of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical ("Bio") content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. Result Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. Conclusion In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Identifying health related occupations of Twitter users through word embedding and deep neural networks
    Kazi Zainab
    Gautam Srivastava
    Vijay Mago
    BMC Bioinformatics, 22
  • [2] Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
    Hamid, Md-Nafiz
    Friedberg, Iddo
    BIOINFORMATICS, 2019, 35 (12) : 2009 - 2016
  • [3] Identifying tweets of personal health experience through word embedding and LSTM neural network
    Jiang, Keyuan
    Feng, Shichao
    Song, Qunhao
    Calix, Ricardo A.
    Gupta, Matrika
    Bernard, Gordon R.
    BMC BIOINFORMATICS, 2018, 19
  • [4] Identifying tweets of personal health experience through word embedding and LSTM neural network
    Keyuan Jiang
    Shichao Feng
    Qunhao Song
    Ricardo A. Calix
    Matrika Gupta
    Gordon R. Bernard
    BMC Bioinformatics, 19
  • [5] Detecting Adverse Drug Reactions on Twitter with Convolutional Neural Networks and Word Embedding Features
    Masino A.J.
    Forsyth D.
    Fiks A.G.
    Journal of Healthcare Informatics Research, 2018, 2 (1-2) : 25 - 43
  • [6] Automatic personality prediction from Indonesian user on twitter using word embedding and neural networks
    Jeremy, Nicholaus Hendrik
    Suhartono, Derwin
    5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 416 - 422
  • [7] Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding
    Ebenezer Ojo, Olumide
    Ta, Thang-Hoang
    Gelbukh, Alexander
    Calvo, Hiram
    Sidorov, Grigori
    Oluwayemisi Adebanji, Olaronke
    COMPUTACION Y SISTEMAS, 2022, 26 (02): : 1007 - 1013
  • [8] Identifying Personal Health Experience Tweets with Deep Neural Networks
    Jiang, Keyuan
    Gupta, Ravish
    Gupta, Matrika
    Calix, Ricardo A.
    Bernard, Gordon R.
    2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 1174 - 1177
  • [9] Embedding Watermarks into Deep Neural Networks
    Uchida, Yusuke
    Nagai, Yuki
    Sakazawa, Shigeyuki
    Satoh, Shin'ichi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 274 - 282
  • [10] Identifying Smartphone Users Based on Activities in Daily Living Using Deep Neural Networks
    Mekruksavanich, Sakorn
    Jitpattanakul, Anuchit
    INFORMATION, 2024, 15 (01)