Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data

被引：18

作者：

Venkataramana, Lokeswari ^{[1
]}

Jacob, Shomona Gracia

Ramadoss, Rajavel ^{[2
]}

Saisuma, Dodda ^{[1
]}

Haritha, Dommaraju ^{[1
]}

Manoja, Kunthipuram ^{[1
]}

机构：

[1] Sri Sivasubramaniya Nadar Coll Engn, Dept CSE, Chennai, Tamil Nadu, India

[2] Sri Sivasubramaniya Nadar Coll Engn, Dept ECE, Chennai, Tamil Nadu, India

来源：

GENES & GENOMICS | 2019年 / 41卷 / 11期

关键词：

Parallelized hybrid feature selection; Correlation feature subset selection; Rank-based methods; Parallel classification; Spark; DistributedWekaSpark; PREDICTION;

D O I：

10.1007/s13258-019-00859-x

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Background Data mining techniques are used to mine unknown knowledge from huge data. Microarray gene expression (MGE) data plays a major role in predicting type of cancer. But as MGE data is huge in volume, applying traditional data mining approaches is time consuming. Hence parallel programming frameworks like Hadoop, Spark and Mahout are necessary to ease the task of computation. Objective Not all the gene expressions are necessary in prediction, it is very essential to select important genes for improving classification accuracy. So feature selection algorithms are parallelized and executed on Spark framework to eliminate unnecessary genes and identify only predictive genes in very less time without affecting prediction accuracy. Methods Parallelized hybrid feature selection (HFS) method is proposed to serve the purpose. This method includes parallelized correlation feature subset selection followed by rank-based feature selection methods. The selected subset of genes is evaluated using parallel classification algorithms. The accuracy values obtained are compared with existing rank-weight feature selection, parallelized recursive feature selection methods and also with the values obtained by executing parallelized HFS on DistributedWekaSpark. Results The classification accuracy obtained with the proposed parallelized HFS method is 97% and 79% for gastric cancer and childhood leukemia respectively. The proposed parallelized HFS method produced similar to 4% to similar to 15% improvement in classification accuracy when compared with previous methods. Conclusion The results reveal the fact that the proposed parallelized feature selection algorithm is scalable to growing medical data and predicts cancer sub-types in lesser time with higher accuracy.

引用

页码：1301 / 1313

页数：13

共 50 条

[1] Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data
Lokeswari Venkataramana
Shomona Gracia Jacob
Rajavel Ramadoss
Dodda Saisuma
Dommaraju Haritha
Kunthipuram Manoja
[J]. Genes & Genomics, 2019, 41 : 1301 - 1313
[2] A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification
Almugren, Nada
Alshamlan, Hala
[J]. IEEE ACCESS, 2019, 7 : 78533 - 78548
[3] A New hybrid Feature selection-Classification model to Improve Cancer Sample Classification Accuracy in Microarray Gene Expression Data
Bandyopadhyay, Ritaban
Sharma, Arijt Das
Dasgupta, Bidya
Ghosh, Ankita
Das, Chandra
Bose, Shilpi
[J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
[4] Feature Selection for Cancer Classification on Microarray Expression Data
Hsu, Hui-Huang
Lu, Ming-Da
[J]. ISDA 2008: EIGHTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 3, PROCEEDINGS, 2008, : 153 - 158
[5] Parallel classification and feature selection in microarray data using SPRINT
Mitchell, Lawrence
Sloan, Terence M.
Mewissen, Muriel
Ghazal, Peter
Forster, Thorsten
Piotrowski, Michal
Trew, Arthur
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (04): : 854 - 865
[6] A hybrid feature selection approach for microarray gene expression data
Tan, Feng
Fu, Xuezheng
Wang, Hao
Zhang, Yanqing
Bourgeois, Anu
[J]. COMPUTATIONAL SCIENCE - ICCS 2006, PT 2, PROCEEDINGS, 2006, 3992 : 678 - 685
[7] Hybrid Feature Selection Algorithm mRMR-ICA for Cancer Classification from Microarray Gene Expression Data
Wang, Shuaiqun
Kong, Wei
Aorigele
Deng, Jin
Gao, Shangce
Zeng, Weiming
[J]. COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2018, 21 (06) : 420 - 430
[8] Feature selection methods on gene expression microarray data for cancer classification: A systematic review
Alhenawi, Esra'a
Al-Sayyed, Rizik
Hudaib, Amjad
Mirjalili, Seyedali
[J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 140
[9] A discrete bacterial algorithm for feature selection in classification of microarray gene expression cancer data
Wang, Hong
Jing, Xingjian
Niu, Ben
[J]. KNOWLEDGE-BASED SYSTEMS, 2017, 126 : 8 - 19
[10] Improving feature subset selection using a genetic algorithm for microarray gene expression data
Tan, Feng
Fu, Xuezheng
Zhang, Yanqing
Bourgeois, Anu G.
[J]. 2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 2514 - 2519

← 1 2 3 4 5 →