Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction

被引:10
|
作者
Chen, Benhui [1 ,2 ]
Hu, Jinglu [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka 8080135, Japan
[2] Dali Univ, Sch Math & Comp Sci, Dali 671003, Yunnan, Peoples R China
关键词
hierarchical multi-label classification; imbalanced dataset learning; hierarchical SMOTE; consistency ensemble;
D O I
10.1002/tee.21714
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. Gene function prediction is a complicated HMC problem with large class number and usually strongly imbalanced class distributions. This paper proposes an improved HMC method based on over-sampling and hierarchy constraint for solving the gene function prediction problem. The HMC task is transferred into a set of binary support vector machine (SVM) classification tasks. Then, two measures are implemented to enhance the HMC performance by introducing the hierarchy constraint into learning procedures. Firstly, for imbalanced classes, a hierarchical synthetic minority over-sampling technique (SMOTE) is proposed as over-sampling preprocessing to improve the SVM learning performance. Secondly, an improved True Path Rule (TPR) ensemble approach is introduced to combine the results of binary probabilistic SVM classifications. It can improve the classification results and guarantee the hierarchy constraint of classes. Experiment results on four benchmark FunCat Yeast datasets show that the proposed method significantly outperforms the basic TPR method and the Flat ensemble method. (C) 2012 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
引用
收藏
页码:183 / 189
页数:7
相关论文
共 50 条
  • [1] Hierarchical multi-label prediction of gene function
    Barutcuoglu, Z
    Schapire, RE
    Troyanskaya, OG
    BIOINFORMATICS, 2006, 22 (07) : 830 - 836
  • [2] A hierarchical multi-label classification method based on neural networks for gene function prediction
    Feng, Shou
    Fu, Ping
    Zheng, Wenbin
    BIOTECHNOLOGY & BIOTECHNOLOGICAL EQUIPMENT, 2018, 32 (06) : 1613 - 1621
  • [3] The importance of the label hierarchy in hierarchical multi-label classification
    Jurica Levatić
    Dragi Kocev
    Sašo Džeroski
    Journal of Intelligent Information Systems, 2015, 45 : 247 - 271
  • [4] The importance of the label hierarchy in hierarchical multi-label classification
    Levatic, Jurica
    Kocev, Dragi
    Dzeroski, Saso
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 45 (02) : 247 - 271
  • [5] Effects of the hierarchy in hierarchical, multi-label classification
    Daisey, Katie
    Brown, Steven D.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2020, 207
  • [6] Hierarchical multi-label classification with SVMs: A case study in gene function prediction
    Vateekul, Peerapon
    Kubat, Miroslav
    Sarinnapakorn, Kanoksri
    INTELLIGENT DATA ANALYSIS, 2014, 18 (04) : 717 - 738
  • [7] Hierarchical Multi-label Associative Classification for Protein Function Prediction Using Gene Ontology
    Sangsuriyun, Sawinee
    Rakthanmanon, Thanawin
    Waiyamai, Kitsana
    CHIANG MAI JOURNAL OF SCIENCE, 2019, 46 (01): : 165 - 179
  • [8] Composite Kernel Based SVM for Hierarchical Multi-label Gene Function Classification
    Chen, Benhui
    Duan, Lihua
    Hu, Jinglu
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [9] The Use of the Label Hierarchy in Hierarchical Multi-label Classification Improves Performance
    Levatic, Jurica
    Kocev, Dragi
    Dzeroski, Saso
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, NFMCP 2013, 2014, 8399 : 162 - 177
  • [10] Reduction strategies for hierarchical multi-label classification in protein function prediction
    Ricardo Cerri
    Rodrigo C. Barros
    André C. P. L. F. de Carvalho
    Yaochu Jin
    BMC Bioinformatics, 17