A Novel Leukemia Gene Features Extraction and Selection Technique for Robust Type Prediction Using Machine Learning

被引:0
|
作者
Ilyas, Mahwish [1 ]
Aamir, Khalid Mahmood [1 ]
Jaleel, Abdul [2 ]
Deriche, Mohamed [3 ]
机构
[1] Univ Sargodha, Dept Comp Sci & Informat Technol, Sargodha 40162, Punjab, Pakistan
[2] Univ Engn & Technol, Dept Comp Sci, GRW, RCET, Lahore, Pakistan
[3] Ajman Univ, Coll Engn & Informat Technol, Artificial Intelligence Res Ctr AIRC, Ajman, U Arab Emirates
关键词
Leukemia prediction; Gene features extraction; Linear discriminant analysis; Dimensionality reduction; LINEAR DISCRIMINANT-ANALYSIS; EXPRESSION DATA; CLASSIFICATION; ALGORITHM; HYBRID;
D O I
10.1007/s13369-024-09254-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The broad term 'leukemia' refers to different types of cancer related to blood cells. Detecting and identifying the specific type of leukemia continues to be a major challenge in the medical field. Diverse machine learning techniques can be vital in analyzing gene expression data from microarray experiments in cancer research related to leukemia. In particular, the Leukemia Gene Expression data from the Curated Microarray Database (CuMiDa) is used here. Microarrays can be challenging in determining expression patterns. In this work, we use Fisher's linear discriminant analysis, a popular technique for dimensionality reduction, together with a new feature selection approach to predict leukemia using microarray data. Our machine learning model is used to predict five types of leukemia including AML, PBSC CD34, Bone Marrow, and CD34 from the bone marrow. This is achieved by first rescaling the data features. We then use a feature selection technique to obtain the 25 most significant features from the dataset's 22,283 features, then further reduce the dimension to 5 features only, to reduce computational complexity. These features are then fed into a Fisher's linear discriminant module and a likelihood-based index for classification. The overall performance of our model was excellent. We examine the results using 2, 4, 5, 6, and 7 selected features. The best classification accuracies are 89.6%, 96.92%, and 96.15%, for 2, 5, and 7 selected features, respectively. Our results outperform the state-of-the-art by about 4%, with an excellent task completion time of less than 100 ms.
引用
收藏
页码:16845 / 16863
页数:19
相关论文
共 50 条
  • [41] Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
    Aromolaran, Olufemi
    Beder, Thomas
    Oswald, Marcus
    Oyelade, Jelili
    Adebiyi, Ezekiel
    Koenig, Rainer
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 612 - 621
  • [42] Machine Learning Framework for the Prediction of Alzheimer's Disease Using Gene Expression Data Based on Efficient Gene Selection
    El-Gawady, Aliaa
    Makhlouf, Mohamed A.
    Tawfik, BenBella S.
    Nassar, Hamed
    SYMMETRY-BASEL, 2022, 14 (03):
  • [43] Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique
    Sampath, Pradeepa
    Elangovan, Gurupriya
    Ravichandran, Kaaveya
    Shanmuganathan, Vimal
    Pasupathi, Subbulakshmi
    Chakrabarti, Tulika
    Chakrabarti, Prasun
    Margala, Martin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [44] A novel method for prediction of EuroLeague game results using hybrid feature extraction and machine learning techniques
    Balli, Serkan
    Ozdemir, Engin
    CHAOS SOLITONS & FRACTALS, 2021, 150
  • [45] PREDICTION OF TYPE 2 DIABETES MELLITUS USING FEATURE SELECTION-BASED MACHINE LEARNING ALGORITHMS
    Yilmaz, Atinc
    HEALTH PROBLEMS OF CIVILIZATION, 2022, 16 (02) : 128 - 139
  • [46] A robust approach in prediction of RCFST columns using machine learning algorithm
    Pham, Van-Thanh
    Kim, Seung-Eock
    STEEL AND COMPOSITE STRUCTURES, 2023, 46 (02): : 153 - 173
  • [47] Keratoconus Severity Classification Using Features Selection and Machine Learning Algorithms
    Aatila, Mustapha
    Lachgar, Mohamed
    Hamid, Hrimech
    Kartit, Ali
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021
  • [48] Data-Driven Diabetes Risk Factor Prediction Using Machine Learning Algorithms with Feature Selection Technique
    Kakoly, Israt Jahan
    Hoque, Md. Rakibul
    Hasan, Najmul
    SUSTAINABILITY, 2023, 15 (06)
  • [49] A novel reinforced online model selection using Q-learning technique for wind speed prediction
    Kosana, Vishalteja
    Teeparthi, Kiran
    Madasthu, Santhosh
    Kumar, Santosh
    Sustainable Energy Technologies and Assessments, 2022, 49
  • [50] A novel reinforced online model selection using Q-learning technique for wind speed prediction
    Kosana, Vishalteja
    Teeparthi, Kiran
    Madasthu, Santhosh
    Kumar, Santosh
    SUSTAINABLE ENERGY TECHNOLOGIES AND ASSESSMENTS, 2022, 49