Identifying health related occupations of Twitter users through word embedding and deep neural networks

被引：3

作者：

Zainab, Kazi ^{[1
]}

Sriyastava, Gautam ^{[1
,2
,3
]}

Mago, Vijay ^{[1
]}

机构：

[1] Lakehead Univ, Dept Comp Sci, Oliver Rd, Thunder Bay, ON, Canada

[2] Brandon Univ, Dept Math & Comp Sci, 270 18th St, Brandon, MB R7A 6A9, Canada

[3] China Med Univ, Res Ctr Interneural Comp, 91 Xueshi Rd, Taichung 40402, Taiwan

来源：

BMC BIOINFORMATICS | 2022年 / 22卷 / SUPPL 10期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Deep learning; Natural language processing; Text classification; Medical data; Twitter;

D O I：

10.1186/s12859-022-04933-2

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background Twitter is a popular social networking site where short messages or "tweets" of users have been used extensively for research purposes. However, not much research has been done in mining the medical professions, such as detecting the occupations of users from their biographical contents. Mining such professions can be used to build efficient recommender systems for cost-effective targeted advertisements. Moreover, it is highly important to develop effective methods to identify the occupation of users since conventional classification methods rely on features developed by human intelligence. Although, the result may be favorable for the classification problem. However, it is still extremely challenging for traditional classifiers to predict the medical occupations accurately since it involves predicting multiple occupations. Hence this study emphasizes predicting the medical occupational class of users through their public biographical ("Bio") content. We have conducted our analysis by annotating the bio content of Twitter users. In this paper, we propose a method of combining word embedding with state-of-art neural network models that include: Long Short Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit, Bidirectional Encoder Representations from Transformers, and A lite BERT. Moreover, we have also observed that by composing the word embedding with the neural network models there is no need to construct any particular attribute or feature. By using word embedding, the bio contents are formatted as dense vectors which are fed as input into the neural network models as a sequence of vectors. Result Performance metrics that include accuracy, precision, recall, and F1-score have shown a significant difference between our method of combining word embedding with neural network models than with the traditional methods. The scores have proved that our proposed approach has outperformed the traditional machine learning techniques for detecting medical occupations among users. ALBERT has performed the best among the deep learning networks with an F1 score of 0.90. Conclusion In this study, we have presented a novel method of detecting the occupations of Twitter users engaged in the medical domain by merging word embedding with state-of-art neural networks. The outcomes of our approach have demonstrated that our method can further advance the process of analyzing corpora of social media without going through the trouble of developing computationally expensive features.

引用

页数：15

共 50 条

[1] Identifying health related occupations of Twitter users through word embedding and deep neural networks
Kazi Zainab
Gautam Srivastava
Vijay Mago
BMC Bioinformatics, 22
[2] Identifying antimicrobial peptides using word embedding with deep recurrent neural networks
Hamid, Md-Nafiz
Friedberg, Iddo
BIOINFORMATICS, 2019, 35 (12) : 2009 - 2016
[3] Identifying tweets of personal health experience through word embedding and LSTM neural network
Jiang, Keyuan
Feng, Shichao
Song, Qunhao
Calix, Ricardo A.
Gupta, Matrika
Bernard, Gordon R.
BMC BIOINFORMATICS, 2018, 19
[4] Identifying tweets of personal health experience through word embedding and LSTM neural network
Keyuan Jiang
Shichao Feng
Qunhao Song
Ricardo A. Calix
Matrika Gupta
Gordon R. Bernard
BMC Bioinformatics, 19
[5] Detecting Adverse Drug Reactions on Twitter with Convolutional Neural Networks and Word Embedding Features
Masino A.J.
Forsyth D.
Fiks A.G.
Journal of Healthcare Informatics Research, 2018, 2 (1-2) : 25 - 43
[6] Automatic personality prediction from Indonesian user on twitter using word embedding and neural networks
Jeremy, Nicholaus Hendrik
Suhartono, Derwin
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020, 2021, 179 : 416 - 422
[7] Automatic Hate Speech Detection Using Deep Neural Networks and Word Embedding
Ebenezer Ojo, Olumide
Ta, Thang-Hoang
Gelbukh, Alexander
Calvo, Hiram
Sidorov, Grigori
Oluwayemisi Adebanji, Olaronke
COMPUTACION Y SISTEMAS, 2022, 26 (02): : 1007 - 1013
[8] Identifying Personal Health Experience Tweets with Deep Neural Networks
Jiang, Keyuan
Gupta, Ravish
Gupta, Matrika
Calix, Ricardo A.
Bernard, Gordon R.
2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2017, : 1174 - 1177
[9] Embedding Watermarks into Deep Neural Networks
Uchida, Yusuke
Nagai, Yuki
Sakazawa, Shigeyuki
Satoh, Shin'ichi
PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 274 - 282
[10] Identifying Smartphone Users Based on Activities in Daily Living Using Deep Neural Networks
Mekruksavanich, Sakorn
Jitpattanakul, Anuchit
INFORMATION, 2024, 15 (01)

← 1 2 3 4 5 →