Topic Modeling for Interpretable Text Classification From EHRs

被引：11

作者：

Rijcken, Emil

Kaymak, Uzay

Scheepers, Floortje

Mosteiro, Pablo

Zervanou, Kalliopi

Spruit, Marco

机构：

[1] Jheronimus Academy of Data Science, Eindhoven University of Technology, Eindhoven

[2] Department of Information and Computing Sciences, Utrecht University, Utrecht

[3] University Medical Center Utrecht, Utrecht

[4] Public Health and Primary Care (PHEG), Leiden University Medical Center, Leiden University, Leiden

[5] Leiden Institute of Advanced Computer Science (LIACS), Faculty of Science, Leiden University, Leiden

来源：

FRONTIERS IN BIG DATA | 2022年 / 5卷

关键词：

text classification; topic modeling; explainability; interpretability; electronic health records; psychiatry; natural language processing; information extraction;

D O I：

10.3389/fdata.2022.846930

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance.

引用

页数：11

共 50 条

[1] Multi-label dataless text classification with topic modeling
Zha, Daochen
Li, Chenliang
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
[2] A Study on Topic Modeling for Feature Space Reduction in Text Classification
Pfeifer, Daniel
Leidner, Jochen L.
[J]. FLEXIBLE QUERY ANSWERING SYSTEMS, 2019, 11529 : 403 - 412
[3] Dataless Text Classification: A Topic Modeling Approach with Document Manifold
Li, Ximing
Li, Changchun
Chi, Jinjin
Ouyang, Jihong
Li, Chenliang
[J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 973 - 982
[4] Multi-label dataless text classification with topic modeling
Daochen Zha
Chenliang Li
[J]. Knowledge and Information Systems, 2019, 61 : 137 - 160
[5] Lexicon Induction for Interpretable Text Classification
Clos, Jeremie
Wiratunga, Nirmalie
[J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES (TPDL 2017), 2017, 10450 : 498 - 510
[6] Text mining of CHO bioprocess bibliome: Topic modeling and document classification
Wang, Qinghua
Olshin, Jonathan
Vijay-Shanker, K.
Wu, Cathy H.
[J]. PLOS ONE, 2023, 18 (04):
[7] ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling
Alcoforado, Alexandre
Ferraz, Thomas Palmeira
Gerber, Rodrigo
Bustos, Enzo
Oliveira, Andre Seidel
Veloso, Bruno Miguel
Siqueira, Fabio Levy
Reali Costa, Anna Helena
[J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 125 - 136
[8] Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification
Ha Nguyen Thi Thu
Tinh Dao Thanh
Thanh Nguyen Hai
Vinh Ho Ngoc
[J]. 2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 1284 - 1288
[9] A Survey of Topic Models in Text Classification
Xia, Linzhong
Luo, Dean
Zhang, Chunxiao
Wu, Zhou
[J]. 2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2019), 2019, : 244 - 250
[10] SPARSE TOPIC MODEL FOR TEXT CLASSIFICATION
Liu, Tao
[J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1916 - 1920

← 1 2 3 4 5 →