Predicting gene function using hierarchical multi-label decision tree ensembles

被引:125
|
作者
Schietgat, Leander [1 ]
Vens, Celine [1 ]
Struyf, Jan [1 ]
Blockeel, Hendrik [1 ]
Kocev, Dragi [2 ]
Dzeroski, Saso [2 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, B-3001 Leuven, Belgium
[2] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana 1000, Slovenia
来源
BMC BIOINFORMATICS | 2010年 / 11卷
基金
美国国家科学基金会; 比利时弗兰德研究基金会;
关键词
PROTEIN FUNCTION; SCALE DATA; CLASSIFICATION; INTEGRATION; ASSOCIATION; ANNOTATION; DATABASE;
D O I
10.1186/1471-2105-11-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. Results: We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Conclusions: Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction
    Chen, Benhui
    Hu, Jinglu
    [J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2012, 7 (02) : 183 - 189
  • [42] Cluster Tree based Multi-Label Classification for Protein Function Prediction
    Wu, Qingyao
    Ye, Yunming
    Zhang, Xiaofeng
    Ho, Shen-Shyang
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [43] An evolutionary approach to build ensembles of multi-label classifiers
    Moyano, Jose M.
    Gibaja, Eva L.
    Cios, Krzysztof J.
    Ventura, Sebastian
    [J]. INFORMATION FUSION, 2019, 50 : 168 - 180
  • [44] Investigating the Impact of Diversity in Ensembles of Multi-label Classifiers
    Nascimento, Diego S. C.
    Canuto, Anne M. P.
    Bandeira, Danilo R. C.
    Araujo, Daniel
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [45] A Parallel Decision Tree Based Algorithm on MPI for Multi-label Classification Learning
    Zhou, Yihao
    Ji, Zhenzhou
    Wang, Kaiyu
    [J]. PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ARTIFICIAL INTELLIGENCE (CAAI 2017), 2017, 134 : 366 - 369
  • [46] Hybrid Decision Tree Architecture Utilizing Local SVMs for Multi-Label Classification
    Madjarov, Gjorgji
    Gjorgjevikj, Dejan
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, PT II, 2012, 7209 : 1 - 12
  • [47] GML_DT: A Novel Graded Multi-label Decision Tree Classifier
    Farsal, Wissal
    Ramdani, Mohammed
    Anter, Samir
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (12) : 249 - 254
  • [48] Predicting Label Distribution from Multi-label Ranking
    Lu, Yunan
    Jia, Xiuyi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [49] Predicting Protein Function using Decision Tree
    Singh, Manpreet
    Wadhwa, Parminder Kaur
    Kaur, Surinder
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 29, 2008, 29 : 350 - +
  • [50] Boosting multi-label hierarchical text categorization
    Esuli, Andrea
    Fagni, Tiziano
    Sebastiani, Fabrizio
    [J]. INFORMATION RETRIEVAL, 2008, 11 (04): : 287 - 313