Ensemble of data mining methods for gene ranking

被引：12

作者：

Wilinski, A. ^{[2
]}

Osowski, S. ^{[1
,3
]}

机构：

[1] Warsaw Univ Technol, Inst Theory Elect Engn Measurement & Informat Sys, PL-00661 Warsaw, Poland

[2] Univ Life Sci, Fac Appl Informat & Math, PL-02776 Warsaw, Poland

[3] Mil Univ Technol, Inst Elect Syst, PL-00908 Warsaw, Poland

来源：

BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES | 2012年 / 60卷 / 03期

关键词：

gene expression array; feature selection; gene ranking methods; classification; SVM; SUPPORT VECTOR MACHINES; CANCER CLASSIFICATION; SELECTION;

D O I：

10.2478/v10175-012-0058-x

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The paper presents the ensemble of data mining methods for discovering the most important genes and gene sequences generated by the gene expression arrays, responsible for the recognition of a particular type of cancer. The analyzed methods include the correlation of the feature with a class, application of the statistical hypotheses, the Fisher measure of discrimination and application of the linear Support Vector Machine for characterization of the discrimination ability of the features. In the first step of ranking we apply each method individually, choosing the genes most often selected in the cross validation of the available data set. In the next step we combine the results of different selection methods together and once again choose the genes most frequently appearing in the selected sets. On the basis of this we form the final ranking of the genes. The most important genes form the input information delivered to the Support Vector Machine (SVM) classifier, responsible for the final recognition of tumor from non-tumor data. Different forms of checking the correctness of the proposed ranking procedure have been applied. The first one is relied on mapping the distribution of selected genes on the two-coordinate system formed by two most important principal components of the PCA transformation and applying the cluster quality measures. The other one depicts the results in the graphical form by presenting the gene expressions in the form of pixel intensity for the available data. The final confirmation of the quality of the proposed ranking method are the classification results of recognition of the cancer cases from the non-cancer (normal) ones, performed using the Gaussian kernel SVM. The results of selection of the most significant genes used by the SVM for recognition of the prostate cancer cases from normal cases have confirmed a good accuracy of results. The presented methodology is of potential use for practical application in bioinformatics.

引用

页码：461 / 470

页数：10

共 50 条

[41] Feature Ranking for Multi-target Regression with Tree Ensemble Methods
Petkovic, Matej
Dzeroski, Sao
Kocev, Dragi
[J]. DISCOVERY SCIENCE, DS 2017, 2017, 10558 : 171 - 185
[42] Averaging-Based Ensemble Methods for the Partial Label Ranking Problem
Alfaro, Juan C.
Aledo, Juan A.
Gamez, Jose A.
[J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 410 - 423
[43] Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction
Re, Matteo
Valentini, Giorgio
[J]. PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN SYSTEMS BIOLOGY, 2010, 8 : 98 - 111
[44] Pareto-optimal methods for gene ranking
Hero, AO
Fleury, G
[J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 38 (03): : 259 - 275
[45] Pareto-Optimal Methods for Gene Ranking
Alfred O. Hero
Gilles Fleury
[J]. Journal of VLSI signal processing systems for signal, image and video technology, 2004, 38 : 259 - 275
[46] Ensemble feature ranking
Jong, K
Mary, J
Cornuejols, A
Marchiori, E
Sebag, M
[J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 267 - 278
[47] Advanced Methods for Data Mining
David, Nicoleta
Patrascu, Neculai
Carstea, Claudia-Georgeta
Patrascu, Lucian
Ratiu, Ioan-Gheorghe
Damian, Daniela
[J]. PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, 2009, : 407 - 412
[48] Research on Data Mining Methods
Chen, Ying
Luo, Cheng
[J]. PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MANAGEMENT, EDUCATION, INFORMATION AND CONTROL (MEICI 2016), 2016, 135 : 666 - 669
[49] Methods and problems in data mining
Mannila, H
[J]. DATABASE THEORY - ICDT'97, 1997, 1186 : 41 - 55
[50] Methods for mining HTS data
Harper, Gavin
Pickett, Stephen D.
[J]. DRUG DISCOVERY TODAY, 2006, 11 (15-16) : 694 - 699

← 1 2 3 4 5 →