Application of Machine Learning and Word Embeddings in the Classification of Cancer Diagnosis Using Patient Anamnesis

被引:21
|
作者
Ramos Magna, Andres Alejandro [1 ]
Allende-Cid, Hector [1 ]
Taramasco, Carla [2 ]
Becerra, Carlos [2 ]
Figueroa, Rosa L. [3 ]
机构
[1] Pontificia Univ Catolica Valparaiso, Escuela Ingn Informat, Valparaiso 2374631, Chile
[2] Univ Valparaiso, Escuela Ingn Civil Informat, Valparaiso 2362905, Chile
[3] Univ Concepcion, Dept Ingn Elect, Concepcion 4070409, Chile
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
关键词
History; Medical diagnostic imaging; Breast cancer; Natural language processing; Natural language processing (NLP); machine learning; deep learning; recommendation system; anamnesis; BIDIRECTIONAL LSTM; ICD-9-CM;
D O I
10.1109/ACCESS.2020.3000075
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, one of the main challenges for information systems in healthcare is focused on support for health professionals regarding disease classifications. This work presents an innovative method for a recommendation system for the diagnosis of breast cancer using patient medical histories. In this proposal, techniques of natural language processing (NLP) were implemented on real datasets: one comprised 160, 560 medical histories of anonymous patients from a hospital in Chile for the following categories: breast cancer, cysts and nodules, other cancer, breast cancer surgeries and other diagnoses; and the other dataset was obtained from the MIMIC III dataset. With the application of word-embedding techniques, such as word2vec's skip-gram and BERT, and machine learning techniques, a recommendation system as a tool to support the physician's decision-making was implemented. The obtained results demonstrate that using word embeddings can define a good-quality recommendation system. The results of 20 experiments with 5-fold cross-validation for anamnesis written in Spanish yielded an F1 of 0.980 +/- 0.0014 on the classification of 'cancer' versus 'not cancer' and 0.986 +/- 0.0014 for 'breast cancer' versus 'other cancer'. Similar results were obtained with the MIMIC III dataset.
引用
收藏
页码:106198 / 106213
页数:16
相关论文
共 50 条
  • [21] Patient care classification using machine learning techniques
    Melhem, Shatha
    Al-Aiad, Ahmad
    Al-Ayyad, Muhammad Saleh
    2021 12TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2021, : 57 - 62
  • [22] Malware detection using machine learning based on word2vec embeddings of machine code instructions
    Popov, Igor
    2017 SIBERIAN SYMPOSIUM ON DATA SCIENCE AND ENGINEERING (SSDSE), 2017, : 1 - 4
  • [23] Multicategory classification using an Extreme Learning Machine for Microarray gene expression cancer diagnosis
    Zhang, Runxuan
    Huang, Guang-Bin
    Sundararajan, Narasimhan
    Saratchandran, P.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (03) : 485 - 495
  • [24] Classification of Phishing Email Using Word Embedding and Machine Learning Techniques
    Somesha M.
    Pais A.R.
    Journal of Cyber Security and Mobility, 2022, 11 (03): : 279 - 320
  • [25] An application of machine learning in the criterion updating of diagnosis cancer
    Li, HY
    Li, DC
    Zhang, CH
    Nie, SB
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 187 - 190
  • [26] Classification of Application Traffic Using Tensorflow Machine Learning
    Park, Jee-Tae
    Shim, Kyu-Seok
    Lee, Sung-Ho
    Kim, Myung-Sup
    2017 19TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS 2017): MANAGING A WORLD OF THINGS, 2017, : 391 - 394
  • [27] Application of Machine Learning for Drone Classification using Radars
    Hudson, Sinclair
    Balaji, Bhashyam
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXX, 2021, 11756
  • [28] Detecting Malicious URLs Based on Machine Learning Algorithms and Word Embeddings
    Crisan, Andrei
    Florea, Gabriel
    Halasz, Lorand
    Lemnaru, Camelia
    Oprisa, Ciprian
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2020), 2020, : 187 - 193
  • [29] Learning Bilingual Word Embeddings Using Lexical Definitions
    Shi, Weijia
    Chen, Muhao
    Tian, Yingtao
    Chang, Kai-Wei
    4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019), 2019, : 142 - 147
  • [30] Bengali Word Embeddings and It's Application in Solving Document Classification Problem
    Ahmad, Adnan
    Amin, Mohammad Ruhul
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 425 - 430