Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data

被引:33
|
作者
Garcia-Diaz, Pilar [1 ]
Sanchez-Berriel, Isabel [2 ]
Martinez-Rojas, Juan A. [1 ]
Diez-Pascual, Ana M. [3 ]
机构
[1] Univ Alcala, Polytech Sch, Dept Signal Theory & Commun, Madrid 28805, Spain
[2] Univ La Laguna, Higher Sch Engn & Technol, Dept Comp & Syst Engn, San Cristobal La Laguna 38200, Sc De Tenerife, Spain
[3] Univ Alcala, Fac Sci, Dept Analyt Chem Phys Chem & Chem Engn, Madrid 28805, Spain
关键词
Gene expression cancer; Feature selection; Multi-classification; Grouping genetic algorithm; Extreme learning machine; EXTREME LEARNING-MACHINE; GLOBAL OPTIMIZATION; PAN-CANCER; PREDICTION;
D O I
10.1016/j.ygeno.2019.11.004
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
This paper presents a Grouping Genetic Algorithm (GGA) to solve a maximally diverse grouping problem. It has been applied for the classification of an unbalanced database of 801 samples of gene expression RNA-Seq data in 5 types of cancer. The samples are composed by 20,531 genes. GGA extracts several groups of genes that achieve high accuracy in multiple classification. Accuracy has been evaluated by an Extreme Learning Machine algorithm and was found to be slightly higher in balanced databases than in unbalanced ones. The final classification decision has been made through a weighted majority vote system between the groups of features. The proposed algorithm finally selects 49 genes to classify samples with an average accuracy of 98.81% and a standard deviation of 0.0174.
引用
收藏
页码:1916 / 1925
页数:10
相关论文
共 50 条
  • [1] Data Driven Feature Selection for RNA-Seq Differential Expression Analysis
    Han, Henry
    [J]. PATTERN RECOGNITION IN BIOINFORMATICS, PRIB 2014, 2014, 8626 : 114 - 115
  • [2] Analyzing RNA-Seq Gene Expression Data for Cancer Classification Through ML Approach
    Wahid, Abdul
    Banday, M. Tariq
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (09) : 798 - 810
  • [3] Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
    Lai Jiang
    Celia M. T. Greenwood
    Weixin Yao
    Longhai Li
    [J]. Scientific Reports, 10
  • [4] Bayesian Hyper-LASSO Classification for Feature Selection with Application to Endometrial Cancer RNA-seq Data
    Jiang, Lai
    Greenwood, Celia M. T.
    Yao, Weixin
    Li, Longhai
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [5] MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY
    Lim, David K.
    Rashid, Naim U.
    Ibrahim, Joseph G.
    [J]. ANNALS OF APPLIED STATISTICS, 2021, 15 (01): : 481 - 508
  • [6] Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm
    Chatzilygeroudis, Konstantinos I.
    Vrahatis, Aristidis G.
    Tasoulis, Sotiris K.
    Vrahatis, Michael N.
    [J]. LEARNING AND INTELLIGENT OPTIMIZATION, LION 15, 2021, 12931 : 66 - 79
  • [7] Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification
    Rukhsar, Laiqa
    Bangyal, Waqas Haider
    Ali Khan, Muhammad Sadiq
    Ag Ibrahim, Ag Asri
    Nisar, Kashif
    Rawat, Danda B.
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [8] An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data
    Ahmed, Saeed
    Kabir, Muhammad
    Ali, Zakir
    Arif, Muhammad
    Ali, Farman
    Yu, Dong-Jun
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (09) : 631 - 645
  • [9] Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level
    Castillo, Daniel
    Manuel Galvez, Juan
    Herrera, Luis J.
    Rojas, Fernando
    Valenzuela, Olga
    Caba, Octavio
    Prados, Jose
    Rojas, Ignacio
    [J]. PLOS ONE, 2019, 14 (02):
  • [10] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,