Ensemble of data mining methods for gene ranking

被引:12
|
作者
Wilinski, A. [2 ]
Osowski, S. [1 ,3 ]
机构
[1] Warsaw Univ Technol, Inst Theory Elect Engn Measurement & Informat Sys, PL-00661 Warsaw, Poland
[2] Univ Life Sci, Fac Appl Informat & Math, PL-02776 Warsaw, Poland
[3] Mil Univ Technol, Inst Elect Syst, PL-00908 Warsaw, Poland
关键词
gene expression array; feature selection; gene ranking methods; classification; SVM; SUPPORT VECTOR MACHINES; CANCER CLASSIFICATION; SELECTION;
D O I
10.2478/v10175-012-0058-x
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The paper presents the ensemble of data mining methods for discovering the most important genes and gene sequences generated by the gene expression arrays, responsible for the recognition of a particular type of cancer. The analyzed methods include the correlation of the feature with a class, application of the statistical hypotheses, the Fisher measure of discrimination and application of the linear Support Vector Machine for characterization of the discrimination ability of the features. In the first step of ranking we apply each method individually, choosing the genes most often selected in the cross validation of the available data set. In the next step we combine the results of different selection methods together and once again choose the genes most frequently appearing in the selected sets. On the basis of this we form the final ranking of the genes. The most important genes form the input information delivered to the Support Vector Machine (SVM) classifier, responsible for the final recognition of tumor from non-tumor data. Different forms of checking the correctness of the proposed ranking procedure have been applied. The first one is relied on mapping the distribution of selected genes on the two-coordinate system formed by two most important principal components of the PCA transformation and applying the cluster quality measures. The other one depicts the results in the graphical form by presenting the gene expressions in the form of pixel intensity for the available data. The final confirmation of the quality of the proposed ranking method are the classification results of recognition of the cancer cases from the non-cancer (normal) ones, performed using the Gaussian kernel SVM. The results of selection of the most significant genes used by the SVM for recognition of the prostate cancer cases from normal cases have confirmed a good accuracy of results. The presented methodology is of potential use for practical application in bioinformatics.
引用
收藏
页码:461 / 470
页数:10
相关论文
共 50 条
  • [1] Dynamic Ensemble Selection Methods for Heterogeneous Data Mining
    Ballard, Chris
    Wang, Wenjia
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 1021 - 1026
  • [2] Stacking ensemble approach in data mining methods for landslide prediction
    Solmaz Abdollahizad
    Mohammad Ali Balafar
    Bakhtiar Feizizadeh
    Amin Babazadeh Sangar
    Karim Samadzamini
    [J]. The Journal of Supercomputing, 2023, 79 : 8583 - 8610
  • [3] Stacking ensemble approach in data mining methods for landslide prediction
    Abdollahizad, Solmaz
    Balafar, Mohammad Ali
    Feizizadeh, Bakhtiar
    Sangar, Amin Babazadeh
    Samadzamini, Karim
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (08): : 8583 - 8610
  • [4] A survey of distributed classification based ensemble data mining methods
    Mokeddem, D.
    Belbachir, H.
    [J]. Journal of Applied Sciences, 2009, 9 (20) : 3739 - 3745
  • [5] A NEW ENSEMBLE METHOD FOR FEATURE RANKING IN TEXT MINING
    Sadeghi, Sabereh
    Beigy, Hamid
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2013, 22 (03)
  • [6] Ensemble Methods for Opinion Mining
    Onan, Aytug
    Korukoglu, Serdar
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 212 - 215
  • [7] Credit card fraud detection using ensemble data mining methods
    Bakhtiari, Saeid
    Nasiri, Zahra
    Vahidi, Javad
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29057 - 29075
  • [8] Credit card fraud detection using ensemble data mining methods
    Saeid Bakhtiari
    Zahra Nasiri
    Javad Vahidi
    [J]. Multimedia Tools and Applications, 2023, 82 : 29057 - 29075
  • [9] Experimental evaluation of methods for ranking qualitatively assessed data-mining workflows
    Mileva Boshkoska, Biljana
    Bohanec, Marko
    Znidarsic, Martin
    [J]. FUSING DECISION SUPPORT SYSTEMS INTO THE FABRIC OF THE CONTEXT, 2012, 238 : 175 - +
  • [10] Ensemble feature ranking applied to medical data
    Santos, Vitor
    Datia, Nuno
    Pato, M. P. M.
    [J]. CONFERENCE ON ELECTRONICS, TELECOMMUNICATIONS AND COMPUTERS - CETC 2013, 2014, 17 : 223 - 230