Improving the accuracy of multiclass classification in machine learning: A case study in a cell signaling dataset

被引:3
|
作者
Pablo Gonzalez-Perez, Pedro [1 ]
Eduardo Sanchez-Gutierrez, Maximo [2 ]
机构
[1] Univ Autonoma Metropolitana Cuajimalpa, Dept Matemat Aplicadas & Sistemas, Ciudad De Mexico, Mexico
[2] Univ Autonoma Ciudad Mexico, Colegio Ciencia & Tecnol, Ciudad De Mexico, Mexico
关键词
Multiclass classification; machine learning; exploratory data analysis; dimensionality reduction; cellular signaling data; FEATURE-SELECTION; DIAGNOSIS;
D O I
10.3233/IDA-215826
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is important to make sense of the data within its context to propose a useful model to solve a problem. This domain knowledge includes information not contained in the data, but that will help us understand the data to be fed into a machine-learning algorithm and guide us on what features might help our model. Nevertheless, domain knowledge may become insufficient as the input variables increase, forcing the need to try automated feature selection techniques. In this study, we investigate whether the joint use of 1) feature selection techniques, such as Chi-square, Tree-based Feature Selection, Pearson's Correlation, LASSO, Low Variance, and Recursive Feature Elimination, 2) outlier detection methods such as Isolation-Forest, and 3) Cross-Validation techniques lead to improving the accuracy in multiclass classification in machine learning. Specifically, we address the classification of patterns representing the activation state of cell signaling components into classes that symbolize the different cellular processes triggered in cancer cells. The results presented in this work have shown an accuracy increase with up to 80% fewer input features by only using 3 out of the 16 original descriptors.
引用
收藏
页码:481 / 500
页数:20
相关论文
共 50 条
  • [21] Multiclass covert speech classification using extreme learning machine
    Dipti Pawar
    Sudhir Dhage
    Biomedical Engineering Letters, 2020, 10 : 217 - 226
  • [22] Orthogonal incremental extreme learning machine for regression and multiclass classification
    Li Ying
    Neural Computing and Applications, 2016, 27 : 111 - 120
  • [23] Multiclass covert speech classification using extreme learning machine
    Pawar, Dipti
    Dhage, Sudhir
    BIOMEDICAL ENGINEERING LETTERS, 2020, 10 (02) : 217 - 226
  • [24] Orthogonal incremental extreme learning machine for regression and multiclass classification
    Ying, Li
    NEURAL COMPUTING & APPLICATIONS, 2016, 27 (01): : 111 - 120
  • [25] High-precision multiclass cell classification by supervised machine learning on lectin microarray data
    Shibata, Mayu
    Okamura, Kohji
    Yura, Kei
    Umezawa, Akihiro
    REGENERATIVE THERAPY, 2020, 15 : 195 - 201
  • [26] Improving the prediction accuracy of soil nutrient classification by optimizing extreme learning machine parameters
    Suchithra M.S.
    Pai M.L.
    Information Processing in Agriculture, 2020, 7 (01): : 72 - 82
  • [27] Fuzzy Removing Redundancy Restricted Boltzmann Machine: Improving Learning Speed and Classification Accuracy
    Lu, Xueqin
    Meng, Lingzheng
    Chen, Chao
    Wang, Peisong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (10) : 2495 - 2509
  • [28] Super machine learning: Improving accuracy and reducing variance of behaviour classification from accelerometry
    Ladds M.A.
    Thompson A.P.
    Kadar J.-P.
    Slip D.
    Hocking D.
    Harcourt R.
    Animal Biotelemetry, 5 (1)
  • [29] Improving Mesenchymal Stem Cell Classification Using Machine Learning Techniques
    Sreedevi, B.
    Rajagopalan, S. P.
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2016, 6 (08) : 2043 - 2047
  • [30] Machine learning for Gravity Spy: Glitch classification and dataset
    Bahaadini, S.
    Noroozi, V.
    Rohani, N.
    Coughlin, S.
    Zevin, M.
    Smith, J. R.
    Kalogera, V.
    Katsaggelos, A.
    INFORMATION SCIENCES, 2018, 444 : 172 - 186