Interpretable segmentation of medical free-text records based on word embeddings

被引:6
|
作者
Dobrakowski, Adam Gabriel [1 ]
Mykowiecka, Agnieszka [2 ]
Marciniak, Malgorzata [2 ]
Jaworski, Wojciech [1 ]
Biecek, Przemyslaw [1 ,3 ]
机构
[1] Univ Warsaw, Banacha 2, Warsaw, Poland
[2] Polish Acad Sci, Inst Comp Sci, Jana Kazimierza 5, Warsaw, Poland
[3] Warsaw Univ Technol, Koszykowa 75, Warsaw, Poland
关键词
Electronic health records; Natural language processing; Text clustering; Word embeddings;
D O I
10.1007/s10844-021-00659-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical free-text records store a lot of useful information that can be exploited in developing computer-supported medicine. However, extracting the knowledge from the unstructured text is difficult and depends on the language. In the paper, we apply Natural Language Processing methods to process raw medical texts in Polish and propose a new methodology for clustering of patients' visits. We (1) extract medical terminology from a corpus of free-text clinical records, (2) annotate data with medical concepts, (3) compute vector representations of medical concepts and validate them on the proposed term analogy tasks, (4) compute visit representations as vectors, (5) introduce a new method for clustering of patients' visits and (6) apply the method to a corpus of 100,000 visits. We use several approaches to visual exploration that facilitate interpretation of segments. With our method, we obtain stable and separated segments of visits which are positively validated against final medical diagnoses. In this paper we show how algorithm for segmentation of medical free-text records may be used to aid medical doctors. In addition to this, we share implementation of described methods with examples as open-source R package memr.
引用
收藏
页码:447 / 465
页数:19
相关论文
共 50 条
  • [1] Interpretable Segmentation of Medical Free-Text Records Based on Word Embeddings
    Dobrakowski, Adam Gabriel
    Mykowiecka, Agnieszka
    Marciniak, Mlgorzata
    Jaworski, Wojciech
    Biecek, Przemyslaw
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2020), 2020, 12117 : 45 - 55
  • [2] Interpretable segmentation of medical free-text records based on word embeddings
    Adam Gabriel Dobrakowski
    Agnieszka Mykowiecka
    Małgorzata Marciniak
    Wojciech Jaworski
    Przemysław Biecek
    Journal of Intelligent Information Systems, 2021, 57 : 447 - 465
  • [3] Mining free-text medical records
    Heinze, DT
    Morsch, ML
    Holbrook, J
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2001, : 254 - 258
  • [4] Statistical Section Segmentation in Free-Text Clinical Records
    Tepper, Michael
    Capurro, Daniel
    Xia, Fei
    Vanderwende, Lucy
    Yetisgen-Yildiz, Meliha
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2001 - 2008
  • [5] Interpretable Word Embeddings For Medical Domain
    Jha, Kishlay
    Wang, Yaqing
    Xun, Guangxu
    Zhang, Aidong
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1061 - 1066
  • [6] Automated de-identification of free-text medical records
    Ishna Neamatullah
    Margaret M Douglass
    Li-wei H Lehman
    Andrew Reisner
    Mauricio Villarroel
    William J Long
    Peter Szolovits
    George B Moody
    Roger G Mark
    Gari D Clifford
    BMC Medical Informatics and Decision Making, 8
  • [7] Automated de-identification of free-text medical records
    Neamatullah, Ishna
    Douglass, Margaret M.
    Lehman, Li-wei H.
    Reisner, Andrew
    Villarroel, Mauricio
    Long, William J.
    Szolovits, Peter
    Moody, George B.
    Mark, Roger G.
    Clifford, Gari D.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2008, 8 (1)
  • [8] SEMANTIC LIMITATION IN FREE-TEXT SEARCH BASED ON MORPHOLOGICAL SEGMENTATION
    WENZEL, F
    NACHRICHTEN FUR DOKUMENTATION, 1980, 31 (01): : 29 - 35
  • [9] De-identification of free-text medical records in health information exchange
    Zhou Tian-shu
    Li Peng-fei
    Li Jing-song
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 242 - 245
  • [10] Category of Allergy Identification from Free-Text Medical Records for Data Interoperability
    Lenivtceva, Iuliia
    Kashina, Mariya
    Kopanitsa, Georgy
    PHEALTH 2020: PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON WEARABLE MICRO AND NANO TECHNOLOGIES FOR PERSONALIZED HEALTH, 2020, 273 : 170 - 175