Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

被引:21
|
作者
Ramos Magna, Andres Alejandro [1 ]
Allende-Cid, Hector [1 ]
Taramasco, Carla [2 ]
Becerra, Carlos [2 ]
Figueroa, Rosa L. [3 ]
机构
[1] Pontificia Univ Catolica Valparaiso, Escuela Ingn Informat, Valparaiso 2374631, Chile
[2] Univ Valparaiso, Escuela Ingn Civil Informat, Valparaiso 2362905, Chile
[3] Univ Concepcion, Dept Ingn Elect, Concepcion 4070409, Chile
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
关键词
History; Medical diagnostic imaging; Breast cancer; Natural language processing; Natural language processing (NLP); machine learning; deep learning; recommendation system; anamnesis; BIDIRECTIONAL LSTM; ICD-9-CM;
D O I
10.1109/ACCESS.2020.3000075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 +/- 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 +/- 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.
引用
收藏
页码:106198 / 106213
页数:16
相关论文
共 50 条
  • [31] Clinical Narrative Classification using Discriminant Word Embeddings with ELM
    Lauren, Paula
    Qu, Guangzhi
    Zhang, Feng
    Lendasse, Amaury
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2931 - 2938
  • [32] Using Word Embeddings with Linear Models for Short Text Classification
    Krzywicki, Alfred
    Heap, Bradford
    Bain, Michael
    Wobcke, Wayne
    Schmeidl, Susanne
    AI 2018: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, 11320 : 819 - 827
  • [33] Enhancing Sensitivity Classification with Semantic Features Using Word Embeddings
    McDonald, Graham
    Macdonald, Craig
    Ounis, Iadh
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 450 - 463
  • [34] Diagnosis of Liver Patients using Machine Learning Classification Algorithms
    Dou, Kexin
    PROCEEDINGS OF 2023 4TH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE FOR MEDICINE SCIENCE, ISAIMS 2023, 2023, : 531 - 536
  • [35] Diagnosis of Diabetic Retinopathy Using Machine Learning Classification Algorithm
    Bhatia, Karan
    Arora, Shikhar
    Tomar, Ravi
    PROCEEDINGS ON 2016 2ND INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING TECHNOLOGIES (NGCT), 2016, : 347 - 351
  • [36] Gear Fault Diagnosis and Classification Using Machine Learning Classifier
    Sahoo, Sudarsan
    Laskar, R. A.
    Das, J. K.
    Laskar, S. H.
    2019 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, METAHEURISTICS & SWARM INTELLIGENCE (ISMSI 2019), 2019, : 69 - 72
  • [37] Classification of Microarray Datasets using Finite Impulse Response Extreme Learning Machine for Cancer Diagnosis
    Lee, Kevin
    Man, Zhihong
    Wang, Dianhui
    Cao, Zhenwei
    IECON 2011: 37TH ANNUAL CONFERENCE ON IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2011, : 2347 - 2352
  • [38] Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis
    Kevin Lee
    Zhihong Man
    Dianhui Wang
    Zhenwei Cao
    Neural Computing and Applications, 2013, 22 : 457 - 468
  • [39] Development of Hybrid Machine Learning in Patient Diagnosis Classification Using the XRP Model (Extraction, Reduction & Prediction)
    Putra, Hendra Nusa
    Defit, Sarjon
    Nurcahyo, Gunadi Widi
    BAGHDAD SCIENCE JOURNAL, 2025, 22 (01)
  • [40] Classification of bioinformatics dataset using finite impulse response extreme learning machine for cancer diagnosis
    Lee, Kevin
    Man, Zhihong
    Wang, Dianhui
    Cao, Zhenwei
    NEURAL COMPUTING & APPLICATIONS, 2013, 22 (3-4): : 457 - 468