Predicting gene function using hierarchical multi-label decision tree ensembles

被引:125
|
作者
Schietgat, Leander [1 ]
Vens, Celine [1 ]
Struyf, Jan [1 ]
Blockeel, Hendrik [1 ]
Kocev, Dragi [2 ]
Dzeroski, Saso [2 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, B-3001 Leuven, Belgium
[2] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana 1000, Slovenia
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
美国国家科学基金会; 比利时弗兰德研究基金会;
关键词
PROTEIN FUNCTION; SCALE DATA; CLASSIFICATION; INTEGRATION; ASSOCIATION; ANNOTATION; DATABASE;
D O I
10.1186/1471-2105-11-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Predicting gene function using hierarchical multi-label decision tree ensembles
    Leander Schietgat
    Celine Vens
    Jan Struyf
    Hendrik Blockeel
    Dragi Kocev
    Sašo Džeroski
    [J]. BMC Bioinformatics, 11 (1)
  • [2] Hierarchical multi-label prediction of gene function
    Barutcuoglu, Z
    Schapire, RE
    Troyanskaya, OG
    [J]. BIOINFORMATICS, 2006, 22 (07) : 830 - 836
  • [3] Decision trees for hierarchical multi-label classification
    Celine Vens
    Jan Struyf
    Leander Schietgat
    Sašo Džeroski
    Hendrik Blockeel
    [J]. Machine Learning, 2008, 73 : 185 - 214
  • [4] Decision trees for hierarchical multi-label classification
    Vens, Celine
    Struyf, Jan
    Schietgat, Leander
    Dzeroski, Saso
    Blockeel, Hendrik
    [J]. MACHINE LEARNING, 2008, 73 (02) : 185 - 214
  • [5] Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
    Sangsuriyun, Sawinee
    Rakthanmanon, Thanawin
    Waiyamai, Kitsana
    [J]. CHIANG MAI JOURNAL OF SCIENCE, 2019, 46 (01): : 165 - 179
  • [6] Hierarchical Multi-Label Gene Function Prediction using Adaptive Mutation in Crowding Niching
    Kordmahalleh, Mina Moradi
    Homaifar, Abdollah
    Kc, Dukka B.
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
  • [7] Multi-label classification of gene function using MLPs
    Skabar, Andrew
    Wollersheim, Dennis
    Whitfort, Tim
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 2234 - +
  • [8] Multi-label Feature Selection Techniques for Hierarchical Multi-label Protein Function Prediction
    Cerri, Ricardo
    Mantovani, Rafael G.
    Basgalupp, Marcio P.
    de Carvalho, Andre C. P. L. F.
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [9] Minimization of Decision Tree Depth for Multi-label Decision Tables
    Azad, Mohammad
    Moshkov, Mikhail
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 7 - 12
  • [10] Feature extraction with spectral clustering for gene function prediction using hierarchical multi-label classification
    Miguel Romero
    Oscar Ramírez
    Jorge Finke
    Camilo Rocha
    [J]. Applied Network Science, 7