Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods

被引:29
|
作者
Taghizadeh, Eskandar [1 ]
Heydarheydari, Sahel [2 ]
Saberi, Alihossein [1 ]
JafarpoorNesheli, Shabnam [3 ]
Rezaeijo, Seyed Masoud [4 ]
机构
[1] Ahvaz Jundishapur Univ Med Sci, Dept Med Genet, Fac Med, Ahvaz, Iran
[2] Shoushtar Fac Med Sci, Dept Radiol Technol, Shoushtar, Iran
[3] Univ Sci & Culture, Fac Engn, Tehran, Iran
[4] Ahvaz Jundishapur Univ Med Sci, Dept Med Phys, Fac Med, Ahvaz, Iran
关键词
Breast cancer; Prediction; Transcriptome profiling; Feature selection; Machine learning; MICRORNAS;
D O I
10.1186/s12859-022-04965-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background We used a hybrid machine learning systems (HMLS) strategy that includes the extensive search for the discovery of the most optimal HMLSs, including feature selection algorithms, a feature extraction algorithm, and classifiers for diagnosing breast cancer. Hence, this study aims to obtain a high-importance transcriptome profile linked with classification procedures that can facilitate the early detection of breast cancer. Methods In the present study, 762 breast cancer patients and 138 solid tissue normal subjects were included. Three groups of machine learning (ML) algorithms were employed: (i) four feature selection procedures are employed and compared to select the most valuable feature: (1) ANOVA; (2) Mutual Information; (3) Extra Trees Classifier; and (4) Logistic Regression (LGR), (ii) a feature extraction algorithm (Principal Component Analysis), iii) we utilized 13 classification algorithms accompanied with automated ML hyperparameter tuning, including (1) LGR; (2) Support Vector Machine; (3) Bagging; (4) Gaussian Naive Bayes; (5) Decision Tree; (6) Gradient Boosting Decision Tree; (7) K Nearest Neighborhood; (8) Bernoulli Naive Bayes; (9) Random Forest; (10) AdaBoost, (11) ExtraTrees; (12) Linear Discriminant Analysis; and (13) Multilayer Perceptron (MLP). For evaluating the proposed models' performance, balance accuracy and area under the curve (AUC) were used. Results Feature selection procedure LGR + MLP classifier achieved the highest prediction accuracy and AUC (balanced accuracy: 0.86, AUC = 0.94), followed by an LGR + LGR classifier (balanced accuracy: 0.84, AUC = 0.94). The results showed that achieved AUC for the LGR + LGR classifier belonged to the 20 biomarkers as follows: TMEM212, SNORD115-13, ATP1A4, FRG2, CFHR4, ZCCHC13, FLJ46361, LY6G6E, ZNF323, KRT28, KRT25, LPPR5, C10orf99, PRKACG, SULT2A1, GRIN2C, EN2, GBA2, CUX2, and SNORA66. Conclusions The best performance was achieved using the LGR feature selection procedure and MLP classifier. Results show that the 20 biomarkers had the highest score or ranking in breast cancer detection.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Improved Microarray Data Analysis using Feature Selection Methods with Machine Learning Methods
    Sun, Jing
    Passi, Kalpdrum
    Jain, Chakresh Kumar
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1527 - 1534
  • [42] A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction
    Pudjihartono, Nicholas
    Fadason, Tayaza
    Kempa-Liehr, Andreas W.
    O'Sullivan, Justin M.
    [J]. FRONTIERS IN BIOINFORMATICS, 2022, 2
  • [43] Comparison of Machine Learning Classifiers for Breast Cancer Diagnosis Based on Feature Selection
    Liu, Bo
    Li, Xingrui
    Li, Jianqiang
    Li, Yong
    Lang, Jianlei
    Gu, Rentao
    Wang, Fei
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 4385 - 4390
  • [44] Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction
    Noroozi, Zeinab
    Orooji, Azam
    Erfannia, Leila
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [45] Analyzing the impact of feature selection methods on machine learning algorithms for heart disease prediction
    Zeinab Noroozi
    Azam Orooji
    Leila Erfannia
    [J]. Scientific Reports, 13
  • [46] The Impact of Feature Selection on Different Machine Learning Models for Breast Cancer Classification
    Algherairy, Atheer
    Almattar, Wadha
    Bakri, Eman
    Albelali, Salma
    [J]. 2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 91 - 96
  • [47] Improved Permeability Prediction of Porous Media by Feature Selection and Machine Learning Methods Comparison
    Tian, J. W.
    Qi, Chongchong
    Peng, Kang
    Sun, Yingfeng
    Yaseen, Zaher Mundher
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2022, 36 (02)
  • [48] Volatile Organic Compounds for the Prediction of Lung Cancer by Using Ensembled Machine Learning Model and Feature Selection
    Khanna, Divya
    Kumar, Arun
    Ahmad Bhat, Shahid
    [J]. IEEE Access, 2025, 13 : 9809 - 9820
  • [49] Breast Cancer Prediction: Importance of Feature Selection
    Prateek
    [J]. ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, IC4S 2018, 2019, 924 : 733 - 742
  • [50] Efficient Model for Prediction of Parkinson's Disease Using Machine Learning Algorithms with Hybrid Feature Selection Methods
    Singh, Nutan
    Tripathi, Priyanka
    [J]. BIOMEDICAL ENGINEERING SCIENCE AND TECHNOLOGY, ICBEST 2023, 2024, 2003 : 186 - 203