Early Diagnosis of Pancreatic Cancer via Machine Learning Analysis of a National Electronic Medical Record Database

被引:1
|
作者
Matchaba, Siyabonga [1 ,2 ]
Fellague-Chebra, Rafik [3 ]
Purushottam, Purushottam [4 ]
Johns, Adam [1 ]
机构
[1] Novartis Oncol, Hlth Econ & Evidence Dev, 4110 A 335,1 Hlth Plaza, E Hanover, NJ 07936 USA
[2] Mendel, San Jose, CA USA
[3] Novartis Pharma SAS, Rueil Malmaison, Paris, France
[4] Novartis Healthcare Private Ltd, Hyderabad, India
来源
关键词
CLINICAL-PREDICTION MODEL; HEALTH RECORDS; ASSESS RISK; EPIDEMIOLOGY; NETWORK;
D O I
10.1200/CCI.23.00076
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSE Pancreatic cancer (PaC) is often diagnosed at advanced stages, resulting in one of the lowest survival rates among patients with cancer. The purpose of this study was to investigate whether machine learning (ML) models can predict with high sensitivity and specificity an increased risk for PaC ahead of clinical diagnosis. METHODS Optum deidentified electronic health record (EHR) data set was used to extract 1-year data for each patient and to sample for PaC diagnosis, the number of interactions with the health care system, and unique demographic and clinical features. Data for patients with PaC diagnosis were collected between 1 and 2 years before the diagnosis. Standard binary classification ML models were used on training and testing data sets. Data analyses were performed using the scikit-learn package version 1.0.1. RESULTS The data set consisted of 18,987 patient EHRs collected between December 31, 2007, and December 31, 2017. EHRs with 10 unique features and at least three health care interactions were used for model training (N = 15,189; n = 8,438 [56%] with PaC) and testing (N = 3,798; n = 2,127 [56%] with PaC). The ensemble model achieved an AUC of 0.89, a sensitivity of 85.61%, and a specificity of 76.18% on the testing data set and produced superior results compared with other binary classifiers. Increasing unique health care interactions to nine failed to improve the AUC score. When the testing data set was enlarged to 5,696 patients, the ensemble model achieved an AUC of 0.92 and a specificity of 93.21%, but the sensitivity was compromised. CONCLUSION The ensemble model exceeded the state-of-the-art level of performance for prediction of PaC ahead of clinical diagnosis with a minimal clinically guided input, providing a potential strategy for selection of high-risk patients for further screening.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Early Diagnosis of Pancreatic Cancer via Machine Learning Analysis of a National Electronic Medical Record Database
    Matchaba, Siyabonga
    Fellague-Chebra, Rafik
    Purushottam, Purushottam
    Johns, Adam
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [2] An assessment of diagnosis and treatment of COPD in primary care via an electronic medical record database
    Joish, V
    Stockdale, B
    Brady, E
    Dirani, R
    Brixner, D
    [J]. VALUE IN HEALTH, 2005, 8 (03) : 325 - 325
  • [3] Electronic medical record: research tool for pancreatic cancer?
    Arous, Edward J.
    McDade, Theodore P.
    Smith, Jillian K.
    Chau, Sing
    Sullivan, Mary E.
    Zottola, Ralph J.
    Ranauro, Paul J.
    Shah, Shimul A.
    Whalen, Giles F.
    Tseng, Jennifer F.
    [J]. JOURNAL OF SURGICAL RESEARCH, 2014, 187 (02) : 466 - 470
  • [4] A novel tool for the accurate and affordable early diagnosis of pancreatic cancer via machine learning and bioinformatics.
    Goel, Siya
    Honorio, Jean
    [J]. CANCER RESEARCH, 2021, 81 (13)
  • [5] Machine learning of clinical performance in a pancreatic cancer database
    Hayward, John
    Alvarez, Sergio A.
    Ruiz, Carolina
    Sullivan, Mary
    Tseng, Jennifer
    Whalen, Giles
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 49 (03) : 187 - 195
  • [6] Characterization of Pediatric Chronic Cough Via Electronic Medical Record Database
    Gavin, K.
    Hoffman, M.
    Tam-Williams, J. B.
    Davis, S.
    [J]. AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2023, 207
  • [7] Machine Learning Approaches to Support Medical Imaging Diagnosis of Pancreatic Cancer - A Scoping Review
    Tavares, Florbela
    Rosa, Gilberto
    Henriques, Ines
    Rocha, Nelson Pacheco
    [J]. GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, WORLDCIST 2024, 2024, 986 : 129 - 138
  • [8] Early diagnosis of pancreatic cancer by machine learning methods using urine biomarker combinations
    Acer, Irem
    Bulucu, Firat Orhan
    Icer, Semra
    Latifoglu, Fatma
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2023, 31 (01) : 112 - 125
  • [9] Neoadjuvant therapy and pancreatic cancer: a national cancer database analysis
    Shridhar, Ravi
    Takahashi, Caitlin
    Huston, Jamie
    Meredith, Kenneth L.
    [J]. JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2019, 10 (04) : 663 - 673
  • [10] Primary Pancreatic Lymphoma: An Analysis of the National Cancer Database
    Abushalha, Kamelah
    Silberstein, Peter T.
    [J]. BLOOD, 2023, 142