A Hybrid Model of Cancer Diseases Diagnosis Based on Gene Expression Data with Joint Use of Data Mining Methods and Machine Learning Techniques

被引:6
|
作者
Babichev, Sergii [1 ,2 ]
Yasinska-Damri, Lyudmyla [3 ]
Liakh, Igor [4 ]
机构
[1] Jan Evangelista Purkyne Univ Usti nad Labem, Dept Informat, Usti Nad Labem 40096, Czech Republic
[2] Kherson State Univ, Dept Phys, UA-73008 Kherson, Ukraine
[3] Ukrainian Acad Printing, Dept Comp Sci & Informat Technol, UA-79020 Lvov, Ukraine
[4] Uzhgorod Natl Univ, Dept Informat Sci & Phys & Math Disciplines, UA-88000 Uzhgorod, Ukraine
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 10期
关键词
gene expression profiles; spectral clustering algorithm; convolutional neural network; inductive clustering technique; random forest classifier; alternative voting method; hybrid model; cancer disease;
D O I
10.3390/app13106022
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One of the current focuses of modern bioinformatics is the development of hybrid models to process gene expression data, in order to create diagnostic systems for various diseases. In this study, we propose a solution to this problem that combines an inductive spectral clustering algorithm, random forest classifier, convolutional neural network, and alternative voting method for making the final decision about patient condition. In the first stage, we apply the spectral clustering algorithm to gene expression profiles using inductive methods of objective clustering, with the calculation of internal, external, and balance clustering quality criteria. This results in clusters of mutually correlated and differently expressed gene expression profiles. In the second stage, we apply the random forest classifier and convolutional neural network to identify the examined objects, containing as attributes the gene expression values in the allocated clusters. The presented research solves both binary- and multi-classification tasks. The final decision about the patient's condition is made using the alternative voting method, considering the classification results based on the gene expression data in various clusters. The simulation results showed that the proposed technique was highly effective, achieving a high accuracy in object identification when both classifiers were used. However, the convolutional neural network had a significantly higher data processing efficiency than the random forest algorithm, due to its substantially shorter processing time.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Machine Learning and Data Mining Methods in Early Detection of Stomach Cancer Risk
    Royel, Md Rejaul Islam
    Jaman, Md Ajmanur
    Al Masud, Fuyad
    Ahmed, Arzo
    Muyeed, Abdul
    Ahmed, Kawsar
    [J]. JOURNAL OF APPLIED SCIENCE AND ENGINEERING, 2021, 24 (01): : 1 - 8
  • [22] Data Mining Techniques to Construct a Model: Cardiac Diseases
    Akhtar, Noreen
    Talib, Muhammad Ramzan
    Kanwal, Nosheen
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (01) : 532 - 536
  • [23] A data mining approach based on machine learning techniques to classify biological sequences
    Maddouri, M
    Elloumi, M
    [J]. KNOWLEDGE-BASED SYSTEMS, 2002, 15 (04) : 217 - 223
  • [24] Hybrid model for classification of diseases using data mining and particle swarm optimisation techniques
    Gupta, Rashmi
    Shrivas, Akhilesh Kumar
    Shukla, Ragini
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2023, 17 (03) : 295 - 307
  • [25] Two FCA-Based Methods for Mining Gene Expression Data
    Kaytoue, Mehdi
    Duplessis, Sebastien
    Kuznetsov, Sergei O.
    Napoli, Amedeo
    [J]. FORMAL CONCEPT ANALYSIS: 7TH INTERNATIONAL CONFERENCE, ICFCA 2009, 2009, 5548 : 251 - +
  • [26] Overview of Data Mining Based on Machine Learning
    Zhou, Jia-Sheng
    Cai, Zhi-Yuan
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMMUNICATION ENGINEERING (CSCE 2015), 2015, : 51 - 56
  • [27] Big data mining optimization algorithm based on machine learning model
    Jiao, Changyi
    [J]. Revue d'Intelligence Artificielle, 2020, 34 (01) : 51 - 57
  • [28] Research on real estate pricing methods based on data mining and machine learning
    Yanliang Yu
    Jingfu Lu
    Dan Shen
    Binbing Chen
    [J]. Neural Computing and Applications, 2021, 33 : 3925 - 3937
  • [29] Research on real estate pricing methods based on data mining and machine learning
    Yu, Yanliang
    Lu, Jingfu
    Shen, Dan
    Chen, Binbing
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 3925 - 3937
  • [30] A Step Towards the Explainability of Microarray Data for Cancer Diagnosis with Machine Learning Techniques
    Nogueira, Adara S. R.
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 362 - 369