Ensemble of data mining methods for gene ranking

被引:12
|
作者
Wilinski, A. [2 ]
Osowski, S. [1 ,3 ]
机构
[1] Warsaw Univ Technol, Inst Theory Elect Engn Measurement & Informat Sys, PL-00661 Warsaw, Poland
[2] Univ Life Sci, Fac Appl Informat & Math, PL-02776 Warsaw, Poland
[3] Mil Univ Technol, Inst Elect Syst, PL-00908 Warsaw, Poland
关键词
gene expression array; feature selection; gene ranking methods; classification; SVM; SUPPORT VECTOR MACHINES; CANCER CLASSIFICATION; SELECTION;
D O I
10.2478/v10175-012-0058-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The paper presents the ensemble of data mining methods for discovering the most important genes and gene sequences generated by the gene expression arrays, responsible for the recognition of a particular type of cancer. The analyzed methods include the correlation of the feature with a class, application of the statistical hypotheses, the Fisher measure of discrimination and application of the linear Support Vector Machine for characterization of the discrimination ability of the features. In the first step of ranking we apply each method individually, choosing the genes most often selected in the cross validation of the available data set. In the next step we combine the results of different selection methods together and once again choose the genes most frequently appearing in the selected sets. On the basis of this we form the final ranking of the genes. The most important genes form the input information delivered to the Support Vector Machine (SVM) classifier, responsible for the final recognition of tumor from non-tumor data. Different forms of checking the correctness of the proposed ranking procedure have been applied. The first one is relied on mapping the distribution of selected genes on the two-coordinate system formed by two most important principal components of the PCA transformation and applying the cluster quality measures. The other one depicts the results in the graphical form by presenting the gene expressions in the form of pixel intensity for the available data. The final confirmation of the quality of the proposed ranking method are the classification results of recognition of the cancer cases from the non-cancer (normal) ones, performed using the Gaussian kernel SVM. The results of selection of the most significant genes used by the SVM for recognition of the prostate cancer cases from normal cases have confirmed a good accuracy of results. The presented methodology is of potential use for practical application in bioinformatics.
引用
收藏
页码:461 / 470
页数:10
相关论文
共 50 条
  • [41] Feature Ranking for Multi-target Regression with Tree Ensemble Methods
    Petkovic, Matej
    Dzeroski, Sao
    Kocev, Dragi
    [J]. DISCOVERY SCIENCE, DS 2017, 2017, 10558 : 171 - 185
  • [42] Averaging-Based Ensemble Methods for the Partial Label Ranking Problem
    Alfaro, Juan C.
    Aledo, Juan A.
    Gamez, Jose A.
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2020, 2020, 12344 : 410 - 423
  • [43] Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction
    Re, Matteo
    Valentini, Giorgio
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN SYSTEMS BIOLOGY, 2010, 8 : 98 - 111
  • [44] Pareto-optimal methods for gene ranking
    Hero, AO
    Fleury, G
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 38 (03): : 259 - 275
  • [45] Pareto-Optimal Methods for Gene Ranking
    Alfred O. Hero
    Gilles Fleury
    [J]. Journal of VLSI signal processing systems for signal, image and video technology, 2004, 38 : 259 - 275
  • [46] Ensemble feature ranking
    Jong, K
    Mary, J
    Cornuejols, A
    Marchiori, E
    Sebag, M
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 267 - 278
  • [47] Advanced Methods for Data Mining
    David, Nicoleta
    Patrascu, Neculai
    Carstea, Claudia-Georgeta
    Patrascu, Lucian
    Ratiu, Ioan-Gheorghe
    Damian, Daniela
    [J]. PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, 2009, : 407 - 412
  • [48] Research on Data Mining Methods
    Chen, Ying
    Luo, Cheng
    [J]. PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MANAGEMENT, EDUCATION, INFORMATION AND CONTROL (MEICI 2016), 2016, 135 : 666 - 669
  • [49] Methods and problems in data mining
    Mannila, H
    [J]. DATABASE THEORY - ICDT'97, 1997, 1186 : 41 - 55
  • [50] Methods for mining HTS data
    Harper, Gavin
    Pickett, Stephen D.
    [J]. DRUG DISCOVERY TODAY, 2006, 11 (15-16) : 694 - 699