A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques

被引:6
|
作者
Babichev, Sergii [1 ,2 ]
Yasinska-Damri, Lyudmyla [3 ]
Liakh, Igor [4 ]
机构
[1] Jan Evangelista Purkyne Univ Usti nad Labem, Dept Informat, Usti Nad Labem 40096, Czech Republic
[2] Kherson State Univ, Dept Phys, UA-73008 Kherson, Ukraine
[3] Ukrainian Acad Printing, Dept Comp Sci & Informat Technol, UA-79020 Lvov, Ukraine
[4] Uzhgorod Natl Univ, Dept Informat Sci & Phys & Math Disciplines, UA-88000 Uzhgorod, Ukraine
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 10期
关键词
gene expression profiles; spectral clustering algorithm; convolutional neural network; inductive clustering technique; random forest classifier; alternative voting method; hybrid model; cancer disease;
D O I
10.3390/app13106022
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral clustering algorithm, random forest classifier, convolutional neural network, and alternative voting method for making the final decision about patient condition. In the first stage, we apply the spectral clustering algorithm to gene expression profiles using inductive methods of objective clustering, with the calculation of internal, external, and balance clustering quality criteria. This results in clusters of mutually correlated and differently expressed gene expression profiles. In the second stage, we apply the random forest classifier and convolutional neural network to identify the examined objects, containing as attributes the gene expression values in the allocated clusters. The presented research solves both binary- and multi-classification tasks. The final decision about the patient's condition is made using the alternative voting method, considering the classification results based on the gene expression data in various clusters. The simulation results showed that the proposed technique was highly effective, achieving a high accuracy in object identification when both classifiers were used. However, the convolutional neural network had a significantly higher data processing efficiency than the random forest algorithm, due to its substantially shorter processing time.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Utilizing Machine Learning Techniques for Categorizing Cancer Based on Gene Expression Data: A Review
    Begum, S.
    Dey, Sandipan
    Chakraborty, D.
    Hembrom, T.
    Hazra, S.
    Barman, D.
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 1093 - 1112
  • [2] Data Mining of Gene Expression Data by Fuzzy and Hybrid Fuzzy Methods
    Schaefer, Gerald
    Nakashima, Tomoharu
    [J]. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2010, 14 (01): : 23 - 29
  • [3] Applying data mining techniques for cancer classification on gene expression data
    Yeh, Jinn-Yi
    [J]. CYBERNETICS AND SYSTEMS, 2008, 39 (06) : 583 - 602
  • [4] Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
    Alharbi, Fadi
    Vakanski, Aleksandar
    [J]. BIOENGINEERING-BASEL, 2023, 10 (02):
  • [5] Machine Learning Techniques for Data Mining: A Survey
    Sharma, Seema
    Agrawal, Jitendra
    Agarwal, Shikha
    Sharma, Sanjeev
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2013, : 162 - 167
  • [6] Early diagnosis of diabetes mellitus using data mining and machine learning techniques
    Deepa, K.
    Kumar, C. Ranjeeth
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (03) : 3999 - 4011
  • [7] Data mining/machine learning methods in foodomics
    Jimenez-Carvelo, Ana M.
    Cuadros-Rodriguez, Luis
    [J]. CURRENT OPINION IN FOOD SCIENCE, 2021, 37 : 76 - 82
  • [8] Outlier data mining model for sports data analysis based on machine learning
    Yin, Zhimeng
    Cui, Wei
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2733 - 2742
  • [9] Data Mining and Machine Learning Methods Applied to A Numerical Clinching Model
    Goetz, Marco
    Leichsenring, Ferenc
    Kropp, Thomas
    Muller, Peter
    Falk, Tobias
    Graf, Wolfgang
    Kaliske, Michael
    Drossel, Welf-Guntram
    [J]. CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2018, 117 (03): : 387 - 423
  • [10] Mining gene expression data using data mining techniques : A critical review
    Mabu, Audu Musa
    Prasad, Rajesh
    Yadav, Raghav
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (03): : 723 - 742