Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods

被引:29
|
作者
Taghizadeh, Eskandar [1 ]
Heydarheydari, Sahel [2 ]
Saberi, Alihossein [1 ]
JafarpoorNesheli, Shabnam [3 ]
Rezaeijo, Seyed Masoud [4 ]
机构
[1] Ahvaz Jundishapur Univ Med Sci, Dept Med Genet, Fac Med, Ahvaz, Iran
[2] Shoushtar Fac Med Sci, Dept Radiol Technol, Shoushtar, Iran
[3] Univ Sci & Culture, Fac Engn, Tehran, Iran
[4] Ahvaz Jundishapur Univ Med Sci, Dept Med Phys, Fac Med, Ahvaz, Iran
关键词
Breast cancer; Prediction; Transcriptome profiling; Feature selection; Machine learning; MICRORNAS;
D O I
10.1186/s12859-022-04965-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background We used a hybrid machine learning systems (HMLS) strategy that includes the extensive search for the discovery of the most optimal HMLSs, including feature selection algorithms, a feature extraction algorithm, and classifiers for diagnosing breast cancer. Hence, this study aims to obtain a high-importance transcriptome profile linked with classification procedures that can facilitate the early detection of breast cancer. Methods In the present study, 762 breast cancer patients and 138 solid tissue normal subjects were included. Three groups of machine learning (ML) algorithms were employed: (i) four feature selection procedures are employed and compared to select the most valuable feature: (1) ANOVA; (2) Mutual Information; (3) Extra Trees Classifier; and (4) Logistic Regression (LGR), (ii) a feature extraction algorithm (Principal Component Analysis), iii) we utilized 13 classification algorithms accompanied with automated ML hyperparameter tuning, including (1) LGR; (2) Support Vector Machine; (3) Bagging; (4) Gaussian Naive Bayes; (5) Decision Tree; (6) Gradient Boosting Decision Tree; (7) K Nearest Neighborhood; (8) Bernoulli Naive Bayes; (9) Random Forest; (10) AdaBoost, (11) ExtraTrees; (12) Linear Discriminant Analysis; and (13) Multilayer Perceptron (MLP). For evaluating the proposed models' performance, balance accuracy and area under the curve (AUC) were used. Results Feature selection procedure LGR + MLP classifier achieved the highest prediction accuracy and AUC (balanced accuracy: 0.86, AUC = 0.94), followed by an LGR + LGR classifier (balanced accuracy: 0.84, AUC = 0.94). The results showed that achieved AUC for the LGR + LGR classifier belonged to the 20 biomarkers as follows: TMEM212, SNORD115-13, ATP1A4, FRG2, CFHR4, ZCCHC13, FLJ46361, LY6G6E, ZNF323, KRT28, KRT25, LPPR5, C10orf99, PRKACG, SULT2A1, GRIN2C, EN2, GBA2, CUX2, and SNORA66. Conclusions The best performance was achieved using the LGR feature selection procedure and MLP classifier. Results show that the 20 biomarkers had the highest score or ranking in breast cancer detection.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods
    Eskandar Taghizadeh
    Sahel Heydarheydari
    Alihossein Saberi
    Shabnam JafarpoorNesheli
    Seyed Masoud Rezaeijo
    [J]. BMC Bioinformatics, 23
  • [2] A Comparative Study for Breast Cancer Prediction using Machine Learning and Feature Selection
    Dhanya, R.
    Paul, Irene Rose
    Akula, Sai Sindhu
    Sivakumar, Madhumathi
    Nair, Jyothisha J.
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1049 - 1055
  • [3] Feature selection and classification in breast cancer prediction using IoT and machine learning
    Gopal, V. Nanda
    Al-Turjman, Fadi
    Kumar, R.
    Anand, L.
    Rajesh, M.
    [J]. MEASUREMENT, 2021, 178
  • [4] Prediction of core cancer genes using a hybrid of feature selection and machine learning methods
    Liu, Y. X.
    Zhang, N. N.
    He, Y.
    Lun, L. J.
    [J]. GENETICS AND MOLECULAR RESEARCH, 2015, 14 (03): : 8871 - 8882
  • [5] Machine Learning and Feature Selection Methods for EGFR Mutation Status Prediction in Lung Cancer
    Morgado, Joana
    Pereira, Tania
    Silva, Francisco
    Freitas, Claudia
    Negrao, Eduardo
    de Lima, Beatriz Flor
    da Silva, Miguel Correia
    Madureira, Antonio J.
    Ramos, Isabel
    Hespanhol, Venceslau
    Costa, Jose Luis
    Cunha, Antonio
    Oliveira, Helder P.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (07):
  • [6] Prediction of breast cancer by profiling of urinary RNA metabolites using Support Vector Machine-based feature selection
    Henneges, Carsten
    Bullinger, Dino
    Fux, Richard
    Friese, Natascha
    Seeger, Harald
    Neubauer, Hans
    Laufer, Stefan
    Gleiter, Christoph H.
    Schwab, Matthias
    Zell, Andreas
    Kammerer, Bernd
    [J]. BMC CANCER, 2009, 9
  • [7] Prediction of breast cancer by profiling of urinary RNA metabolites using Support Vector Machine-based feature selection
    Carsten Henneges
    Dino Bullinger
    Richard Fux
    Natascha Friese
    Harald Seeger
    Hans Neubauer
    Stefan Laufer
    Christoph H Gleiter
    Matthias Schwab
    Andreas Zell
    Bernd Kammerer
    [J]. BMC Cancer, 9
  • [8] Enhancing Parkinson's Disease Prediction Using Machine Learning and Feature Selection Methods
    Saeed, Faisal
    Al-Sarem, Mohammad
    Al-Mohaimeed, Muhannad
    Emara, Abdelhamid
    Boulila, Wadii
    Alasli, Mohammed
    Ghabban, Fahad
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (03): : 5639 - 5657
  • [9] Lung Cancer Prediction Using Stochastic Diffusion Search (SDS) Based Feature Selection and Machine Learning Methods
    S. Shanthi
    N. Rajkumar
    [J]. Neural Processing Letters, 2021, 53 : 2617 - 2630
  • [10] Lung Cancer Prediction Using Stochastic Diffusion Search (SDS) Based Feature Selection and Machine Learning Methods
    Shanthi, S.
    Rajkumar, N.
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (04) : 2617 - 2630