Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches

被引:4
|
作者
Jung, Jinmyung [1 ]
Yoo, Sunyong [2 ]
机构
[1] Univ Suwon, Coll Informat & Commun Technol, Div Data Sci, Hwaseong 18323, South Korea
[2] Chonnam Natl Univ, Dept ICT Convergence Syst Engn, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
metastasis marker; gene expression; machine learning; XGBoost; breast cancer; feature importance; PROTEIN; REGULATOR; RESOURCE;
D O I
10.3390/genes14091820
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein-protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer
    Thalor, Anamika
    Joon, Hemant Kumar
    Singh, Gagandeep
    Roy, Shikha
    Gupta, Dinesh
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 1618 - 1631
  • [42] Identification of markers of prostate cancer progression using candidate gene expression
    S E T Larkin
    S Holmes
    I A Cree
    T Walker
    V Basketter
    B Bickers
    S Harris
    S D Garbis
    P A Townsend
    C Aukim-Hastie
    British Journal of Cancer, 2012, 106 : 157 - 165
  • [43] Identification of markers of prostate cancer progression using candidate gene expression
    Larkin, S. E. T.
    Holmes, S.
    Cree, I. A.
    Walker, T.
    Basketter, V.
    Bickers, B.
    Harris, S.
    Garbis, S. D.
    Townsend, P. A.
    Aukim-Hastie, C.
    BRITISH JOURNAL OF CANCER, 2012, 106 (01) : 157 - 165
  • [44] Using machine learning approach based on gene expression profiling to predict the risk of bone metastasis in lung cancer
    He, Tao
    JOURNAL OF CLINICAL ONCOLOGY, 2024, 42 (16)
  • [45] Breast and Colon Cancer Classification from Gene Expression Profiles Using Data Mining Techniques
    AbdElNabi, Mohamed Loey Ramadan
    Jasim, Mohammed Wajeeh
    EL-Bakry, Hazem M.
    Taha, Mohamed Hamed N.
    Khalifa, Nour Eldeen M.
    SYMMETRY-BASEL, 2020, 12 (03):
  • [46] Identification of Novel Breast Cancer Genes Based on Gene Expression Profiles and PPI Data
    Yang, Cheng-Wen
    Cao, Huan-Huan
    Guo, Yu
    Feng, Yuan-Ming
    Zhang, Ning
    CURRENT PROTEOMICS, 2019, 16 (05) : 415 - 426
  • [47] Identification of Plausible Candidates in Prostate Cancer Using Integrated Machine Learning Approaches
    Kour, Bhumandeep
    Shukla, Nidhi
    Bhargava, Harshita
    Sharma, Devendra
    Sharma, Amita
    Singh, Anjuvan
    Valadi, Jayaraman
    Sadasukhi, Trilok Chand
    Vuree, Sugunakar
    Suravajhala, Prashanth
    CURRENT GENOMICS, 2023, 24 (05) : 287 - 306
  • [48] Breast Cancer Identification from Patients' Tweet Streaming Using Machine Learning Solution on Spark
    Omran, Nahla F.
    Abd-el Ghany, Sara F.
    Saleh, Hager
    Nabil, Ayman
    COMPLEXITY, 2021, 2021
  • [49] Gene expression profiles in prostate cancer: Identification of candidate non-invasive diagnostic markers
    Mengual, L.
    Ars, E.
    Lozano, J. J.
    Burset, M.
    Izquierdo, L.
    Ingelmo-Torres, M.
    Gaya, J. M.
    Algaba, F.
    Villavicencio, H.
    Ribal, M. J.
    Alcaraz, A.
    ACTAS UROLOGICAS ESPANOLAS, 2014, 38 (03): : 143 - 149
  • [50] Machine Learning Approaches for Breast Cancer Diagnosis and Prognosis
    Sharma, Ayush
    Kulshrestha, Sudhanshu
    Daniel, Sibi
    2017 INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND ITS ENGINEERING APPLICATIONS (ICSOFTCOMP), 2017,