Text mining for the Vaccine Adverse Event Reporting System: medical text classification using informative feature selection

被引:73
|
作者
Botsis, Taxiarchis [1 ,2 ]
Nguyen, Michael D. [1 ]
Woo, Emily Jane [1 ]
Markatou, Marianthi [3 ,4 ]
Ball, Robert [1 ]
机构
[1] CBER, Off Biostat & Epidemiol, FDA, Rockville, MD 20852 USA
[2] Univ Tromso, Dept Comp Sci, Tromso, Norway
[3] Cornell Univ, Dept Stat Sci, New York, NY 10021 USA
[4] IBM TJ Watson Res Ctr, New York, NY USA
关键词
RECORDS; MEDDRA;
D O I
10.1136/amiajnl-2010-000022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N-pos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.
引用
收藏
页码:631 / 638
页数:8
相关论文
共 50 条
  • [1] Discriminative and informative features for biomolecular text mining with ensemble feature selection
    Van Landeghem, Sofie
    Abeel, Thomas
    Saeys, Yvan
    Van de Peer, Yves
    [J]. BIOINFORMATICS, 2010, 26 (18) : i554 - i560
  • [2] Vaccine adverse event text mining system for extracting features from vaccine safety reports
    Botsis, Taxiarchis
    Buttolph, Thomas
    Nguyen, Michael D.
    Winiecki, Scott
    Woo, Emily Jane
    Ball, Robert
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2012, 19 (06) : 1011 - 1018
  • [3] The contribution of the Vaccine adverse event Text Mining system to the classification of possible Guillain-Barre Syndrome reports
    Botsis, T.
    Woo, E. J.
    Ball, R.
    [J]. APPLIED CLINICAL INFORMATICS, 2013, 4 (01): : 88 - 99
  • [4] Data Mining in the US using the Vaccine Adverse Event Reporting System
    John Iskander
    Vitali Pool
    Weigong Zhou
    Roseanne English-Bullard
    [J]. Drug Safety, 2006, 29 : 375 - 384
  • [5] Data mining in the US using the vaccine adverse event reporting system
    Iskander, John
    Pool, Vitali
    Zhou, Weigong
    English-Bullard, Roseanne
    [J]. DRUG SAFETY, 2006, 29 (05) : 375 - 384
  • [6] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [7] Feature selection methods for event detection in Twitter: a text mining approach
    Hossny, Ahmad Hany
    Mitchell, Lewis
    Lothian, Nick
    Osborne, Grant
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2020, 10 (01)
  • [8] Feature selection methods for event detection in Twitter: a text mining approach
    Ahmad Hany Hossny
    Lewis Mitchell
    Nick Lothian
    Grant Osborne
    [J]. Social Network Analysis and Mining, 2020, 10
  • [9] Feature Selection For Text Classification Using Genetic Algorithms
    Bidi, Noria
    Elberrichi, Zakaria
    [J]. PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
  • [10] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,