Topic Modeling for Interpretable Text Classification From EHRs

被引:11
|
作者
Rijcken, Emil
Kaymak, Uzay
Scheepers, Floortje
Mosteiro, Pablo
Zervanou, Kalliopi
Spruit, Marco
机构
[1] Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven
[2] Department of Information and Computing Sciences, Utrecht University, Utrecht
[3] University Medical Center Utrecht, Utrecht
[4] Public Health and Primary Care (PHEG), Leiden University Medical Center, Leiden University, Leiden
[5] Leiden Institute of Advanced Computer Science (LIACS), Faculty of Science, Leiden University, Leiden
来源
FRONTIERS IN BIG DATA | 2022年 / 5卷
关键词
text classification; topic modeling; explainability; interpretability; electronic health records; psychiatry; natural language processing; information extraction;
D O I
10.3389/fdata.2022.846930
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [2] A Study on Topic Modeling for Feature Space Reduction in Text Classification
    Pfeifer, Daniel
    Leidner, Jochen L.
    [J]. FLEXIBLE QUERY ANSWERING SYSTEMS, 2019, 11529 : 403 - 412
  • [3] Dataless Text Classification: A Topic Modeling Approach with Document Manifold
    Li, Ximing
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    Li, Chenliang
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 973 - 982
  • [4] Multi-label dataless text classification with topic modeling
    Daochen Zha
    Chenliang Li
    [J]. Knowledge and Information Systems, 2019, 61 : 137 - 160
  • [5] Lexicon Induction for Interpretable Text Classification
    Clos, Jeremie
    Wiratunga, Nirmalie
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES (TPDL 2017), 2017, 10450 : 498 - 510
  • [6] Text mining of CHO bioprocess bibliome: Topic modeling and document classification
    Wang, Qinghua
    Olshin, Jonathan
    Vijay-Shanker, K.
    Wu, Cathy H.
    [J]. PLOS ONE, 2023, 18 (04):
  • [7] ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
    Alcoforado, Alexandre
    Ferraz, Thomas Palmeira
    Gerber, Rodrigo
    Bustos, Enzo
    Oliveira, Andre Seidel
    Veloso, Bruno Miguel
    Siqueira, Fabio Levy
    Reali Costa, Anna Helena
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 125 - 136
  • [8] Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification
    Ha Nguyen Thi Thu
    Tinh Dao Thanh
    Thanh Nguyen Hai
    Vinh Ho Ngoc
    [J]. 2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 1284 - 1288
  • [9] A Survey of Topic Models in Text Classification
    Xia, Linzhong
    Luo, Dean
    Zhang, Chunxiao
    Wu, Zhou
    [J]. 2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 244 - 250
  • [10] SPARSE TOPIC MODEL FOR TEXT CLASSIFICATION
    Liu, Tao
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1916 - 1920