Classifying adverse drug reactions from imbalanced twitter data

被引:18
|
作者
Dai, Hong-Jie [1 ,2 ]
Wang, Chen-Kai [3 ]
机构
[1] Natl Kaohsiung Univ Sci & Technol, Dept Elect Engn, Kaohsiung, Taiwan
[2] Kaohsiung Med Univ, Post Baccalaureate Med, Kaohsiung, Taiwan
[3] Chunghwa Telecom Labs, Big Data Labs, Taoyuan, Taiwan
关键词
Adverse drug reaction; Imbalanced data classification; Word embeddings; Social media; Synthetic minority over-sampling technique; Pharmacovigilance; Text classification; SOCIAL MEDIA; BIG-DATA; PHARMACOVIGILANCE;
D O I
10.1016/j.ijmedinf.2019.05.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: Nowadays, social media are often being used by general public to create and share public messages related to their health. With the global increase in social media usage, there is a trend of posting information related to adverse drug reactions (ADR). Mining the social media data for this type of information will be helpful for pharmacological post-marketing surveillance and monitoring. Although the concept of using social media to facilitate pharmacovigilance is convincing, construction of automatic ADR detection systems remains a challenge because the corpora compiled from social media tend to be highly imbalanced, posing a major obstacle to the development of classifiers with reliable performance. Methods: Several methods have been proposed to address the challenge of imbalanced corpora. However, we are not aware of any studies that investigated the effectiveness of the strategies of dealing with the problem of imbalanced data in the context of ADR detection from social media. In light of this, we evaluated a variety of imbalanced techniques and proposed a novel word embedding-based synthetic minority over-sampling technique (WESMOTE), which synthesizes new training examples from the sentence representation based on word embeddings. We compared the performance of all methods on two large imbalanced datasets released for the purpose of detecting ADR posts. Results: In comparison with the state-of-the-art approaches, the classifiers that incorporated imbalanced classification techniques achieved comparable or better F-scores. All of our best performing configurations combined random under-sampling with techniques including the proposed WESMOTE, boosting and ensemble, implying that an integration of these approaches with under-sampling provides a reliable solution for large imbalanced social media datasets. Furthermore, ensemble-based methods like vote-based under-sampling (VUE) and random under-sampling boosting can be alternatives for the hybrid synthetic methods because both methods increase the diversity of the created weak classifiers, leading to better recall and overall F-scores for the minority classes. Conclusions: Data collected from the social media are usually very large and highly imbalanced. In order to maximize the performance of a classifier trained on such data, applications of imbalanced strategies are required. We considered several practical methods for handling imbalanced Twitter data along with their performance on the binary classification task with respect to ADRs. In conclusion, the following practical insights are gained: 1) When dealing with text classification, the proposed word embedding-based synthetic minority over-sampling technique is more effective than traditional synthetic-based over-sampling methods. 2) In cases where large amounts of training data are available, the imbalanced strategies combined with under-sampling techniques are preferred. 3) Finally, employment of advanced methods does not guarantee better performance than simpler ones such as VUE, which achieved high performance with advantages like faster building time and ease of development.
引用
收藏
页码:122 / 132
页数:11
相关论文
共 50 条
  • [1] Detecting Adverse Drug Reactions from User-Generated Twitter Data: A Case Study
    Shah, Mihir
    Patel, Maitry
    Patel, Priyank
    Tan, Xing
    [J]. 2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2022, : 552 - 558
  • [2] Twitter Opinion Mining for Adverse Drug Reactions
    Wu, Liang
    Moh, Teng-Sheng
    Khuri, Natalia
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 1570 - 1574
  • [3] Feature Engineering for Recognizing Adverse Drug Reactions from Twitter Posts
    Dai, Hong-Jie
    Touray, Musa
    Jonnagaddala, Jitendra
    Syed-Abdul, Shabbir
    [J]. INFORMATION, 2016, 7 (02)
  • [4] Discovering Adverse Drug Reactions from Twitter: A Sentiment Analysis Perspective
    Ribeiro, Luiz A. P. A.
    Cinalli, Daniel
    Bicharra Garcia, Ana Cristina
    [J]. PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 1172 - 1177
  • [5] Twitter can Help to Find Adverse Drug Reactions
    Cieliebak, Mark
    Egger, Dominic
    Uzdilli, Fatih
    [J]. ERCIM NEWS, 2016, (104): : 31 - 32
  • [6] Identifying Adverse Drug Reactions by Analyzing Twitter Messages
    Rajapaksha, Parinda
    Weerasinghe, Ruvan
    [J]. 2015 Fifteenth International Conference on Advances in ICT for Emerging Regions (ICTer), 2015, : 37 - 42
  • [7] Utilizing Different Word Representation Methods for Twitter Data in Adverse Drug Reactions Extraction
    Lin, Wei-San
    Dai, Hong-Jie
    Jonnagaddala, Jitendra
    Chang, Nai-Wun
    Jue, Toni Rose
    Iqbal, Usman
    Shao, Joni Yu-Hsuan
    Chiang, I-Jen
    Li, Yu-Chuan
    [J]. 2015 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2015, : 260 - 265
  • [8] EADR: an ensemble learning method for detecting adverse drug reactions from twitter
    Keyvanpour, Mohammad Reza
    Pourebrahim, Behnaz
    Mehrmolaei, Soheila
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2024, 14 (01)
  • [9] Classifying Severely Imbalanced Data
    Klement, William
    Wilk, Szymon
    Michalowski, Wojtek
    Matwin, Stan
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 258 - 264
  • [10] Joining the DoTS: new approach to classifying adverse drug reactions
    Aronson, JK
    Ferner, RE
    [J]. BRITISH MEDICAL JOURNAL, 2003, 327 (7425): : 1222 - 1225