Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data

被引:18
|
作者
Venkataramana, Lokeswari [1 ]
Jacob, Shomona Gracia
Ramadoss, Rajavel [2 ]
Saisuma, Dodda [1 ]
Haritha, Dommaraju [1 ]
Manoja, Kunthipuram [1 ]
机构
[1] Sri Sivasubramaniya Nadar Coll Engn, Dept CSE, Chennai, Tamil Nadu, India
[2] Sri Sivasubramaniya Nadar Coll Engn, Dept ECE, Chennai, Tamil Nadu, India
关键词
Parallelized hybrid feature selection; Correlation feature subset selection; Rank-based methods; Parallel classification; Spark; DistributedWekaSpark; PREDICTION;
D O I
10.1007/s13258-019-00859-x
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background Data mining techniques are used to mine unknown knowledge from huge data. Microarray gene expression (MGE) data plays a major role in predicting type of cancer. But as MGE data is huge in volume, applying traditional data mining approaches is time consuming. Hence parallel programming frameworks like Hadoop, Spark and Mahout are necessary to ease the task of computation. Objective Not all the gene expressions are necessary in prediction, it is very essential to select important genes for improving classification accuracy. So feature selection algorithms are parallelized and executed on Spark framework to eliminate unnecessary genes and identify only predictive genes in very less time without affecting prediction accuracy. Methods Parallelized hybrid feature selection (HFS) method is proposed to serve the purpose. This method includes parallelized correlation feature subset selection followed by rank-based feature selection methods. The selected subset of genes is evaluated using parallel classification algorithms. The accuracy values obtained are compared with existing rank-weight feature selection, parallelized recursive feature selection methods and also with the values obtained by executing parallelized HFS on DistributedWekaSpark. Results The classification accuracy obtained with the proposed parallelized HFS method is 97% and 79% for gastric cancer and childhood leukemia respectively. The proposed parallelized HFS method produced similar to 4% to similar to 15% improvement in classification accuracy when compared with previous methods. Conclusion The results reveal the fact that the proposed parallelized feature selection algorithm is scalable to growing medical data and predicts cancer sub-types in lesser time with higher accuracy.
引用
收藏
页码:1301 / 1313
页数:13
相关论文
共 50 条
  • [1] Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data
    Lokeswari Venkataramana
    Shomona Gracia Jacob
    Rajavel Ramadoss
    Dodda Saisuma
    Dommaraju Haritha
    Kunthipuram Manoja
    [J]. Genes & Genomics, 2019, 41 : 1301 - 1313
  • [2] A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification
    Almugren, Nada
    Alshamlan, Hala
    [J]. IEEE ACCESS, 2019, 7 : 78533 - 78548
  • [3] A New hybrid Feature selection-Classification model to Improve Cancer Sample Classification Accuracy in Microarray Gene Expression Data
    Bandyopadhyay, Ritaban
    Sharma, Arijt Das
    Dasgupta, Bidya
    Ghosh, Ankita
    Das, Chandra
    Bose, Shilpi
    [J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
  • [4] Feature Selection for Cancer Classification on Microarray Expression Data
    Hsu, Hui-Huang
    Lu, Ming-Da
    [J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 3, PROCEEDINGS, 2008, : 153 - 158
  • [5] Parallel classification and feature selection in microarray data using SPRINT
    Mitchell, Lawrence
    Sloan, Terence M.
    Mewissen, Muriel
    Ghazal, Peter
    Forster, Thorsten
    Piotrowski, Michal
    Trew, Arthur
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (04): : 854 - 865
  • [6] A hybrid feature selection approach for microarray gene expression data
    Tan, Feng
    Fu, Xuezheng
    Wang, Hao
    Zhang, Yanqing
    Bourgeois, Anu
    [J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 2, PROCEEDINGS, 2006, 3992 : 678 - 685
  • [7] Hybrid Feature Selection Algorithm mRMR-ICA for Cancer Classification from Microarray Gene Expression Data
    Wang, Shuaiqun
    Kong, Wei
    Aorigele
    Deng, Jin
    Gao, Shangce
    Zeng, Weiming
    [J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (06) : 420 - 430
  • [8] Feature selection methods on gene expression microarray data for cancer classification: A systematic review
    Alhenawi, Esra'a
    Al-Sayyed, Rizik
    Hudaib, Amjad
    Mirjalili, Seyedali
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 140
  • [9] A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data
    Wang, Hong
    Jing, Xingjian
    Niu, Ben
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 126 : 8 - 19
  • [10] Improving feature subset selection using a genetic algorithm for microarray gene expression data
    Tan, Feng
    Fu, Xuezheng
    Zhang, Yanqing
    Bourgeois, Anu G.
    [J]. 2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 2514 - 2519