Improving the accuracy of multiclass classification in machine learning: A case study in a cell signaling dataset

被引:1
|
作者
Pablo Gonzalez-Perez, Pedro [1 ]
Eduardo Sanchez-Gutierrez, Maximo [2 ]
机构
[1] Univ Autonoma Metropolitana Cuajimalpa, Dept Matemat Aplicadas & Sistemas, Ciudad De Mexico, Mexico
[2] Univ Autonoma Ciudad Mexico, Colegio Ciencia & Tecnol, Ciudad De Mexico, Mexico
关键词
Multiclass classification; machine learning; exploratory data analysis; dimensionality reduction; cellular signaling data; FEATURE-SELECTION; DIAGNOSIS;
D O I
10.3233/IDA-215826
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is important to make sense of the data within its context to propose a useful model to solve a problem. This domain knowledge includes information not contained in the data, but that will help us understand the data to be fed into a machine-learning algorithm and guide us on what features might help our model. Nevertheless, domain knowledge may become insufficient as the input variables increase, forcing the need to try automated feature selection techniques. In this study, we investigate whether the joint use of 1) feature selection techniques, such as Chi-square, Tree-based Feature Selection, Pearson's Correlation, LASSO, Low Variance, and Recursive Feature Elimination, 2) outlier detection methods such as Isolation-Forest, and 3) Cross-Validation techniques lead to improving the accuracy in multiclass classification in machine learning. Specifically, we address the classification of patterns representing the activation state of cell signaling components into classes that symbolize the different cellular processes triggered in cancer cells. The results presented in this work have shown an accuracy increase with up to 80% fewer input features by only using 3 out of the 16 original descriptors.
引用
收藏
页码:481 / 500
页数:20
相关论文
共 50 条
  • [1] Improving Machine Learning Classification Accuracy for Breathing Abnormalities by Enhancing Dataset
    Rehman, Mubashir
    Shah, Raza Ali
    Khan, Muhammad Bilal
    Shah, Syed Aziz
    AbuAli, Najah Abed
    Yang, Xiaodong
    Alomainy, Akram
    Imran, Muhmmad Ali
    Abbasi, Qammer H.
    SENSORS, 2021, 21 (20)
  • [2] A machine learning software tool for multiclass classification
    Wang, Shangzhou
    Lu, Haohui
    Khan, Arif
    Hajati, Farshid
    Khushi, Matloob
    Uddin, Shahadat
    SOFTWARE IMPACTS, 2022, 13
  • [3] Extreme Learning Machine for Regression and Multiclass Classification
    Huang, Guang-Bin
    Zhou, Hongming
    Ding, Xiaojian
    Zhang, Rui
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (02): : 513 - 529
  • [4] Improving Classification Accuracy of a Machine Learning approach for FPGA Timing Closure
    Que Yanghua
    Kapre, Nachiket
    Ng, Harnhua
    Teo, Kirvy
    2016 IEEE 24TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2016, : 80 - 83
  • [5] Multiclass Classification of Brain Cancer with Machine Learning Algorithms
    Erkal, Begum
    Basak, Selen
    Ciloglu, Alper
    Sener, Duygu Dede
    2020 MEDICAL TECHNOLOGIES CONGRESS (TIPTEKNO), 2020,
  • [6] Dataset Anonimyzation for Machine Learning: An ISP Case Study
    Campanile, Lelio
    Forgione, Fabio
    Marulli, Fiammetta
    Palmiero, Gianfranco
    Sanghez, Carlo
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT II, 2021, 12950 : 589 - 597
  • [7] How Machine Learning Classification Accuracy Changes in a Happiness Dataset with Different Demographic Groups
    Sweeney, Colm
    Ennis, Edel
    Mulvenna, Maurice
    Bond, Raymond
    O'Neill, Siobhan
    COMPUTERS, 2022, 11 (05)
  • [8] Multiclass Classification Machine Learning Identification of Common Poisonings
    Nogee, Daniel
    Haimovich, Adrian
    Hart, Katherine
    Tomassoni, Anthony
    CLINICAL TOXICOLOGY, 2020, 58 (11) : 1083 - 1084
  • [9] Feasibility of Active Machine Learning for Multiclass Compound Classification
    Lang, Tobias
    Flachsenberg, Florian
    von Luxburg, Ulrike
    Rarey, Matthias
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2016, 56 (01) : 12 - 20
  • [10] Predictive modeling of gestational weight gain: a machine learning multiclass classification study
    Audêncio Victor
    Hellen Geremias dos Santos
    Gabriel Ferreira Santos Silva
    Fabiano Barcellos Filho
    Alexandre de Fátima Cobre
    Liania A. Luzia
    Patrícia H.C. Rondó
    Alexandre Dias Porto Chiavegatto Filho
    BMC Pregnancy and Childbirth, 24 (1)