Applications of Text Mining techniques to extract meaningful information from gastroenterology medical reports

被引:0
|
作者
Vallelunga, Rosarina [1 ]
Scarpino, Ileana [1 ]
Martinis, Maria Chiara [1 ,2 ]
Luzza, Francesco [3 ]
Zucco, Chiara [1 ,2 ]
机构
[1] Magna Graecia Univ Catanzaro, Dept Med & Surg Sci, Viale Europa, I-88100 Catanzaro, Italy
[2] Magna Graecia Univ Catanzaro, Data Analyt Reasearch Ctr, Viale Europa, I-88100 Catanzaro, Italy
[3] Magna Graecia Univ Catanzaro, Dept Hlth Sci, Viale Europa, I-88100 Catanzaro, Italy
关键词
Text mining; Topic modeling; LDA; BERTopic; Gastroenterology reports; Inflammatory Bowel Disease (IBD); INFLAMMATORY-BOWEL-DISEASE;
D O I
10.1016/j.jocs.2024.102458
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Text mining techniques, particularly topic modeling, can be used for the automatic extraction of information from medical reports. The ability to autonomously analyze texts and identify topics within them can provide meaningful clinical insights that support physicians in diagnostic settings and enhance the characterization of intestinal diseases, leading to more efficient and automated systems. This study evaluates the effectiveness of Latent Dirichlet Allocation (LDA) and BERTopic in modeling topics from colonoscopy reports related to Crohn's Disease, Ulcerative Colitis, and Polyps. We compared these models in terms of their ability to identify clinically relevant topics, their influence on the performance of machine learning classifiers trained on the derived topic features, and their scalability. Our analysis, based on average results across five iterations of train-test splits, showed that BERTopic generally outperformed LDA in clustering metrics, achieving Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Purity scores of 0.5637, 0.5953, and 0.8447, respectively, compared to LDA's scores of 0.5349, 0.5254, and 0.8149. Additionally, classifiers trained on BERTopic-derived features exhibited improved predictive accuracy and F1-scores, with Logistic Regression reaching a mean accuracy of 0.8464 and a mean F1-score of 0.8507, compared to 0.8319 and 0.8351 for LDA-based features. Despite BERTopic's overall superior performance, LDA demonstrated greater stability and interpretability, making it a viable option in scenarios where computational efficiency is a priority.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Using text mining techniques to extract phenotypic information from the PhenoCHF corpus
    Alnazzawi, Noha
    Thompson, Paul
    Batista-Navarro, Riza
    Ananiadou, Sophia
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2015, 15
  • [2] Using text mining techniques to extract phenotypic information from the PhenoCHF corpus
    Noha Alnazzawi
    Paul Thompson
    Riza Batista-Navarro
    Sophia Ananiadou
    BMC Medical Informatics and Decision Making, 15
  • [3] Semantic Information in Medical Information Systems: Utilization of Text Mining Techniques to Analyze Medical Diagnoses
    Holzinger, Andreas
    Geierhofer, Regina
    Moedritscher, Felix
    Tatzl, Roland
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2008, 14 (22) : 3781 - 3795
  • [4] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qiu, Qinjun
    Xie, Zhong
    Wu, Liang
    Tao, Liufeng
    EARTH SCIENCE INFORMATICS, 2020, 13 (04) : 1393 - 1410
  • [5] Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques
    Qinjun Qiu
    Zhong Xie
    Liang Wu
    Liufeng Tao
    Earth Science Informatics, 2020, 13 : 1393 - 1410
  • [6] A New Approach To Extract Meaningful Clinical Information From Medical Notes
    Bista, Rabindra
    Ranjan, Awanish
    2017 11TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2017,
  • [7] Text Mining: Techniques, Applications, and Challenges
    Justicia de la Torre, C.
    Sanchez, D.
    Blanco, I
    Martin-Bautista, M. J.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2018, 26 (04) : 553 - 582
  • [8] Text Mining: Techniques, Applications and Issues
    Talib, Ramzan
    Hanif, Muhammad Kashif
    Ayesha, Shaeela
    Fatima, Fakeeha
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (11) : 414 - 418
  • [9] A Chain of Text-mining to Extract Information in Archaeology
    Amrani, Ahmed
    Abajian, Vicken
    Kodratoff, Yves
    Matte-Tailliez, Oriane
    2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 12 - +
  • [10] Using a Cellular Automaton to Extract Medical Information from Clinical Reports
    Barigou, Fatiha
    Atmani, Baghdad
    Beldjilali, Bouziane
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2012, 8 (01): : 67 - 84