ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages

被引:18
|
作者
Jin, Ting [1 ]
Nguyen, Nam D. [2 ]
Talos, Flaminia [3 ,4 ,5 ]
Wang, Daifeng [1 ,6 ]
机构
[1] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[2] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA
[3] SUNY Stony Brook, Dept Pathol, Stony Brook, NY 11794 USA
[4] SUNY Stony Brook, Dept Urol, Stony Brook, NY 11794 USA
[5] Stony Brook Canc Ctr, Stony Brook Med, Stony Brook, NY 11794 USA
[6] Univ Wisconsin, Waisman Ctr, Madison, WI 53705 USA
基金
美国国家卫生研究院;
关键词
CELL LUNG-CANCER; TRANSCRIPTION FACTOR; ENRICHMENT ANALYSIS; INHIBITOR; SENSITIVITY; TARGET; BREAST; EGFR;
D O I
10.1093/bioinformatics/btaa935
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a 'black box', barely providing biological and clinical interpretability from the box. Results: To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi-and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine.
引用
收藏
页码:1115 / 1124
页数:10
相关论文
共 15 条
  • [1] Interpretable Machine Learning Approach Reveals Developmental Gene Expression Biomarkers for Cancer Patient Outcomes at Early Stages
    Kamat, Alisha
    Jin, Ting
    Min, So Yeon
    Talos, Flaminia
    Almeida, Jonas
    Wang, Daifeng
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 510 - 510
  • [2] An Interpretable Machine Learning Model for Predicting Long-Term Clinical Outcomes in Recurrent Pericarditis
    Yesilyaprak, Abdullah
    Kumar, Ashwin
    Furqan, Muhammad M.
    Verma, Beni R.
    Agrawal, Ankit
    Syed, Alveena
    Akyuz, Kevser
    Wang, Tom Kai Ming K.
    Cremer, Paul C.
    Klein, Allan L.
    CIRCULATION, 2022, 146
  • [3] Gene Expression Profiling Identifies Two Chordoma Subtypes Associated with Distinct Molecular Mechanisms and Clinical Outcomes
    Bai, Jiwei
    Shi, Jianxin
    Zhang, Yazhuo
    Li, Chuzhong
    Xiong, Yujia
    Koka, Hela
    Wang, Difei
    Zhang, Tongwu
    Song, Lei
    Luo, Wen
    Zhu, Bin
    Hicks, Belynda
    Hutchinson, Amy
    Kirk, Erin
    Troester, Melissa A.
    Li, Mingxuan
    Shen, Yutao
    Ma, Tianshun
    Wang, Junmei
    Liu, Xing
    Wang, Shuai
    Gui, Songbai
    McMaster, Mary L.
    Chanock, Stephen J.
    Parry, Dilys M.
    Goldstein, Alisa M.
    Yang, Xiaohong R.
    CLINICAL CANCER RESEARCH, 2023, 29 (01) : 261 - 270
  • [4] Gene expression profiling reveals underlying molecular mechanisms of the early stages of tamoxifen-induced rat hepatocarcinogenesis
    Pogribny, Igor P.
    Bagnyukova, Tetyana V.
    Tryndyak, Volodymyr P.
    Muskhelisbvili, Levan
    Rodriguez-Juarez, Rocio
    Kovalchuk, Olga
    Han, Tao
    Fuscoe, James C.
    Ross, Sharon A.
    Beland, Frederick A.
    TOXICOLOGY AND APPLIED PHARMACOLOGY, 2007, 225 (01) : 61 - 69
  • [5] Decoding Diabetes Biomarkers and Related Molecular Mechanisms by Using Machine Learning, Text Mining, and Gene Expression Analysis
    Elsherbini, Amira M.
    Alsamman, Alsamman M.
    Elsherbiny, Nehal M.
    El-Sherbiny, Mohamed
    Ahmed, Rehab
    Ebrahim, Hasnaa Ali
    Bakkach, Joaira
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2022, 19 (21)
  • [6] Machine learning gene expression predicting model for ustekinumab response in patients with Crohn's disease
    He, Manrong
    Li, Chao
    Tang, Wanxin
    Kang, Yingxi
    Zuo, Yongdi
    Wang, Yufang
    IMMUNITY INFLAMMATION AND DISEASE, 2021, 9 (04) : 1529 - 1540
  • [7] Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas
    van IJzendoorn, David G. P.
    Szuhai, Karoly
    Briaire-de Bruijn, Inge H.
    Kostine, Marie
    Kuijjer, Marieke L.
    Bovee, Judith V. M. G.
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (02)
  • [8] Linking gene expression to clinical outcomes in pediatric Crohn's disease using machine learning
    Chen, Kevin A.
    Nishiyama, Nina C.
    Ng, Meaghan M. Kennedy
    Shumway, Alexandria
    Joisa, Chinmaya U.
    Schaner, Matthew R.
    Lian, Grace
    Beasley, Caroline
    Zhu, Lee-Ching
    Bantumilli, Surekha
    Kapadia, Muneera R.
    Gomez, Shawn M.
    Furey, Terrence S.
    Sheikh, Shehzad Z.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [9] Linking gene expression to clinical outcomes in pediatric Crohn’s disease using machine learning
    Kevin A. Chen
    Nina C. Nishiyama
    Meaghan M. Kennedy Ng
    Alexandria Shumway
    Chinmaya U. Joisa
    Matthew R. Schaner
    Grace Lian
    Caroline Beasley
    Lee-Ching Zhu
    Surekha Bantumilli
    Muneera R. Kapadia
    Shawn M. Gomez
    Terrence S. Furey
    Shehzad Z. Sheikh
    Scientific Reports, 14
  • [10] Predicting diagnostic gene biomarkers in patients with diabetic kidney disease based on weighted gene co expression network analysis and machine learning algorithms
    Gao, Qian
    Jin, Huawei
    Xu, Wenfang
    Wang, Yanan
    MEDICINE, 2023, 102 (43) : E35618