Multiclass Classification on High Dimension and Low Sample Size Data Using Genetic Programming

被引:5
|
作者
Wei, Tingyang [1 ]
Liu, Wei-Li [1 ]
Zhong, Jinghui [1 ]
Gong, Yue-Jiao [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
关键词
Machine learning; Feature extraction; Gene expression; Programming; Genetic programming; Sociology; Statistics; gene expression programming; high dimension; classification; low sample size; ensemble learning; FEATURE-SELECTION; RULES;
D O I
10.1109/TETC.2020.3034495
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multiclass classification is one of the most fundamental tasks in data mining. However, traditional data mining methods rely on the model assumption, they generally can suffer from the overfitting problem on high dimension and low sample size (HDLSS) data. Trying to address multiclass classification problems on HDLSS data from another perspective, we utilize Genetic Programming (GP), an intrinsic evolutionary classification algorithm that can implement feature construction automatically without model assumption. This article develops an ensemble-based genetic programming classification framework, the Sigmoid-based Ensemble Gene Expression Programming (SE-GEP). To relieve the problem of output conflict in GP-based multiclass classifiers, the proposed method employs a flexible probability representation with continuous relaxation to better integrate the output of all the binary classifiers, an effective data division strategy to further enhance the ensemble performance, and a novel sampling strategy to refine the existing GP-based binary classifier. The experiment results indicate that SE-GEP can attain better classification accuracy compared to other GP methods. Moreover, the comparison with other representative machine learning methods indicates that SE-GEP is a competitive method for multiclass classification in HDLSS data.
引用
收藏
页码:704 / 718
页数:15
相关论文
共 50 条
  • [11] Robust centroid based classification with minimum error rates for high dimension, low sample size data
    Jiang, Jiancheng
    Marron, J. S.
    Jiang, Xuejun
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (08) : 2571 - 2580
  • [12] Multidimensional genetic programming for multiclass classification
    La Cava, William
    Silva, Sara
    Danai, Kourosh
    Spector, Lee
    Vanneschi, Leonardo
    Moore, Jason H.
    SWARM AND EVOLUTIONARY COMPUTATION, 2019, 44 : 260 - 272
  • [13] Deep Neural Networks for High Dimension, Low Sample Size Data
    Liu, Bo
    Wei, Ying
    Zhang, Yu
    Yang, Qiang
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2287 - 2293
  • [14] Random forest kernel for high-dimension low sample size classification
    Cavalheiro, Lucca Portes
    Bernard, Simon
    Barddal, Jean Paul
    Heutte, Laurent
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [15] Random forest kernel for high-dimension low sample size classification
    Lucca Portes Cavalheiro
    Simon Bernard
    Jean Paul Barddal
    Laurent Heutte
    Statistics and Computing, 2024, 34
  • [16] Solving multiclass classification problems by genetic programming
    Winkler, Stephan
    Affenzeller, Michael
    Wagner, Stefan
    WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 1, 2005, : 48 - 53
  • [17] Comparison of binary discrimination methods for high dimension low sample size data
    Bolivar-Cime, A.
    Marron, J. S.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 115 : 108 - 121
  • [18] On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data
    Roy, Sarbojit
    Choudhury, Jyotishka Ray
    Dutta, Subhajit
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [19] Multiclass Object Classification for Computer Vision using Linear Genetic Programming
    Downey, Carlton
    Zhang, Mengjie
    2009 24TH INTERNATIONAL CONFERENCE IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ 2009), 2009, : 73 - 78
  • [20] Genetic programming for multiclass texture classification using a small number of instances
    Al-Sahaf, Harith
    Zhang, Mengjie
    Johnston, Mark
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8886 : 335 - 346