Applications of Text Mining techniques to extract meaningful information from gastroenterology medical reports

被引:0
|
作者
Vallelunga, Rosarina [1 ]
Scarpino, Ileana [1 ]
Martinis, Maria Chiara [1 ,2 ]
Luzza, Francesco [3 ]
Zucco, Chiara [1 ,2 ]
机构
[1] Magna Graecia Univ Catanzaro, Dept Med & Surg Sci, Viale Europa, I-88100 Catanzaro, Italy
[2] Magna Graecia Univ Catanzaro, Data Analyt Reasearch Ctr, Viale Europa, I-88100 Catanzaro, Italy
[3] Magna Graecia Univ Catanzaro, Dept Hlth Sci, Viale Europa, I-88100 Catanzaro, Italy
关键词
Text mining; Topic modeling; LDA; BERTopic; Gastroenterology reports; Inflammatory Bowel Disease (IBD); INFLAMMATORY-BOWEL-DISEASE;
D O I
10.1016/j.jocs.2024.102458
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Text mining techniques, particularly topic modeling, can be used for the automatic extraction of information from medical reports. The ability to autonomously analyze texts and identify topics within them can provide meaningful clinical insights that support physicians in diagnostic settings and enhance the characterization of intestinal diseases, leading to more efficient and automated systems. This study evaluates the effectiveness of Latent Dirichlet Allocation (LDA) and BERTopic in modeling topics from colonoscopy reports related to Crohn's Disease, Ulcerative Colitis, and Polyps. We compared these models in terms of their ability to identify clinically relevant topics, their influence on the performance of machine learning classifiers trained on the derived topic features, and their scalability. Our analysis, based on average results across five iterations of train-test splits, showed that BERTopic generally outperformed LDA in clustering metrics, achieving Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Purity scores of 0.5637, 0.5953, and 0.8447, respectively, compared to LDA's scores of 0.5349, 0.5254, and 0.8149. Additionally, classifiers trained on BERTopic-derived features exhibited improved predictive accuracy and F1-scores, with Logistic Regression reaching a mean accuracy of 0.8464 and a mean F1-score of 0.8507, compared to 0.8319 and 0.8351 for LDA-based features. Despite BERTopic's overall superior performance, LDA demonstrated greater stability and interpretability, making it a viable option in scenarios where computational efficiency is a priority.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Extracting information from the literature by text mining
    Kostoff, RN
    DeMarco, RA
    ANALYTICAL CHEMISTRY, 2001, 73 (13) : 370A - 378A
  • [22] Text mining - from technology to biological applications
    Koehler, J
    BRIEFINGS IN BIOINFORMATICS, 2005, 6 (03) : 220 - 221
  • [23] Learning value-added information of asset management from analyst reports through text mining
    Takahashi, S
    Takahashi, M
    Takahashi, H
    Tsuda, K
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS, 2005, 3684 : 785 - 791
  • [24] Extracting Information from Medical Reports
    El-Halees, Alaa
    Elhaj, Maali
    2021 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT 2021), 2021, : 92 - 96
  • [25] Online Healthcare Information Adoption Assessment Using Text Mining Techniques
    Prabha, M. Surya
    Sarojini, B.
    MOBILE NETWORKS & APPLICATIONS, 2019, 24 (04): : 1160 - 1165
  • [26] Translation of Metaphorical Information in Japanese Literature Combined with Text Mining Techniques
    Wang, Tingting
    Applied Mathematics and Nonlinear Sciences, 2024, 9 (01):
  • [27] Online Healthcare Information Adoption Assessment Using Text Mining Techniques
    M. Surya Prabha
    B. Sarojini
    Mobile Networks and Applications, 2019, 24 : 1160 - 1165
  • [28] Digital Content Analysis with Text Mining Techniques in the Context of Information Management
    Kurt, Levent
    Guerdal, Oya
    Batmaz, Inci
    TURKISH LIBRARIANSHIP, 2022, 36 (04) : 472 - 494
  • [29] A text processing pipeline to extract recommendations from radiology reports
    Yetisgen-Yildiz, Meliha
    Gunn, Martin L.
    Xia, Fei
    Payne, Thomas H.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 (02) : 354 - 362
  • [30] Text mining in reports from students stays in the Czech enterprises
    Antlova, Klara
    Herich, Andrej
    Popelinsky, Lubos
    INNOVATION AND SUSTAINABLE COMPETITIVE ADVANTAGE: FROM REGIONAL DEVELOPMENT TO WORLD ECONOMIES, VOLS 1-5, 2012, : 1342 - 1346