Breast, Lung and Liver Cancer Classification from Structured and Unstructured Data

被引:0
|
作者
Gonzalez-Beltran, Beatriz A. [1 ]
Reyes-Ortiz, Jose A. [1 ]
Montelongo-Gonzalez, Erick E. [1 ]
机构
[1] Univ Autonoma Metropolitana, Dept Sistemas, Mexico City, DF, Mexico
来源
COMPUTACION Y SISTEMAS | 2022年 / 26卷 / 01期
关键词
Cancer classification; structured and unstructured data; deep learning for unstructured data representation; machine learning models; electronic health records;
D O I
10.13053/CyS-26-1-4167
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Currently, cancer is a worldwide public health problem. Machine and deep learning techniques hold great promise in healthcare by analyzing Electronic Health Records (EHR) that contain a large collection of structured and unstructured data. However, most research has been done with structured data, and valuable data is also found in doctor's plain-text notes. Thus, this paper proposes an approach to classify breast, liver, and lung cancer based on structured and unstructured data obtained from the MIMIC-II clinical database by using machine and deep learning techniques. In particular, the Paragraph Vector algorithm is used as a deep learning approach to text representation. The goal of this work is to help physicians in early diagnosis of cancer. The proposed approach was tested on a balanced dataset of breast, liver, and lung cancer patient records. Pre-processing is done with structured and unstructured data, and the result is used as input variables to three machine learning models: Support Vector Machines, Multi Layer Perceptron, and Adaboost-SAMME. Then, the scoring metrics for these models are calculated in different training data configurations to choose the best performing model for classification. Results show that the best performing model was obtained with MLP, achieving 89% precision using unstructured data.
引用
收藏
页码:233 / 243
页数:11
相关论文
共 50 条
  • [1] Improving the performance of lung nodule classification by fusing structured and unstructured data
    Tang, Ning
    Zhang, Rui
    Wei, Zeliang
    Chen, Xicheng
    Li, Gaoming
    Song, Qiuyue
    Yi, Dong
    Wu, Yazhou
    [J]. INFORMATION FUSION, 2022, 88 : 161 - 174
  • [2] Improving the performance of lung nodule classification by fusing structured and unstructured data (vol 88, pg 161, 2022)
    Tang, Ning
    Zhang, Rui
    Wei, Zeliang
    Chen, Xicheng
    Li, Gaoming
    Song, Qiuyue
    Yi, Dong
    Wu, Yazhou
    [J]. INFORMATION FUSION, 2023, 91 : 13 - 14
  • [3] Extraction of Failure Graphs from Structured and Unstructured data
    Schierle, Martin
    Trabold, Daniel
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 324 - 330
  • [4] Breast Cancer Classification using Deep Transfer Learning on Structured Healthcare Data
    Farhadi, Akram
    Chen, David
    McCoy, Rozalina
    Scott, Christopher
    Miller, John A.
    Vachon, Celine M.
    Ngufor, Che
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 277 - 286
  • [5] Warehousing structured and unstructured data for data mining
    Miller, LL
    Honavar, V
    Barta, T
    [J]. ASIS '97 - PROCEEDINGS OF THE 60TH ASIS ANNUAL MEETING, VOL 34 1997, 1997, 34 : 215 - 224
  • [6] Warehousing structured and unstructured data for data mining
    Miller, LL
    Honavar, V
    Barta, T
    [J]. PROCEEDINGS OF THE ASIS ANNUAL MEETING, 1997, 34 : 215 - 224
  • [7] The Mapping Process of Unstructured Data to Structured Data
    Abdullah, Mohammad Fikry
    Ahmad, Kamsuriah
    [J]. 2013 INTERNATIONAL CONFERENCE ON RESEARCH AND INNOVATION IN INFORMATION SYSTEMS (ICRIIS), 2013, : 151 - 155
  • [8] Extracting Structured Data from Unstructured Document with Incomplete Resources
    Dejean, Herve
    [J]. 2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 271 - 275
  • [9] Browsing mixed structured and unstructured data
    Losee, RM
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (02) : 440 - 452
  • [10] Structured and Unstructured Big Data Analytics
    Misluu, Suyash
    Misra, Anuranjan
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 740 - 746