Improving the accuracy of multiclass classification in machine learning: A case study in a cell signaling dataset

被引:3
|
作者
Pablo Gonzalez-Perez, Pedro [1 ]
Eduardo Sanchez-Gutierrez, Maximo [2 ]
机构
[1] Univ Autonoma Metropolitana Cuajimalpa, Dept Matemat Aplicadas & Sistemas, Ciudad De Mexico, Mexico
[2] Univ Autonoma Ciudad Mexico, Colegio Ciencia & Tecnol, Ciudad De Mexico, Mexico
关键词
Multiclass classification; machine learning; exploratory data analysis; dimensionality reduction; cellular signaling data; FEATURE-SELECTION; DIAGNOSIS;
D O I
10.3233/IDA-215826
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is important to make sense of the data within its context to propose a useful model to solve a problem. This domain knowledge includes information not contained in the data, but that will help us understand the data to be fed into a machine-learning algorithm and guide us on what features might help our model. Nevertheless, domain knowledge may become insufficient as the input variables increase, forcing the need to try automated feature selection techniques. In this study, we investigate whether the joint use of 1) feature selection techniques, such as Chi-square, Tree-based Feature Selection, Pearson's Correlation, LASSO, Low Variance, and Recursive Feature Elimination, 2) outlier detection methods such as Isolation-Forest, and 3) Cross-Validation techniques lead to improving the accuracy in multiclass classification in machine learning. Specifically, we address the classification of patterns representing the activation state of cell signaling components into classes that symbolize the different cellular processes triggered in cancer cells. The results presented in this work have shown an accuracy increase with up to 80% fewer input features by only using 3 out of the 16 original descriptors.
引用
收藏
页码:481 / 500
页数:20
相关论文
共 50 条
  • [31] Machine Learning Techniques for Improving Multiclass Anomaly Detection on Conveyor Belts
    Matos, Saulo N.
    Coletti, Otavio F.
    Zimmer, Rafael
    Filho, Fernando U.
    de Carvalho, Ricardo C. C. L.
    da Silva, Victor R.
    Franco, Jorge L.
    Pinton, Thomas V. B.
    de Barros, Luiz G. D.
    Ranieri, Caetano M.
    Lopes, Bruno E.
    Silva, Diego E.
    Ueyama, Jo
    Pessin, Gustavo
    2024 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, I2MTC 2024, 2024,
  • [32] Improving underwater localization accuracy with machine learning
    Rauchenstein, Lynn T.
    Vishnu, Abhinav
    Li, Xinya
    Deng, Zhiqun Daniel
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2018, 89 (07):
  • [33] Adversarial Machine Learning Attacks on Multiclass Classification of IoT Network Traffic
    Pantelakis, Vasileios
    Bountakas, Panagiotis
    Farao, Aristeidis
    Xenakis, Christos
    18TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY & SECURITY, ARES 2023, 2023,
  • [34] Kernelized support vector machine with deep learning: An efficient approach for extreme multiclass dataset
    Zareapoor, Masoumeh
    Shamsolmoali, Pourya
    Jain, Deepak Kumar
    Wanx, Haoxiang
    Yang, Jie
    PATTERN RECOGNITION LETTERS, 2018, 115 : 4 - 13
  • [35] Dataset Size and Machine Learning - Open NMR Databases as a Case Study
    Kuhn, Stefan
    Borges, Ricardo Moreira
    Venturini, Francesco
    Sansotera, Maurizio
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1632 - 1636
  • [36] Liver Cirrhosis Stage Prediction Using Machine Learning: Multiclass Classification
    Sidana, Tejasv Singh
    Singhal, Saransh
    Gupta, Shruti
    Goel, Ruchi
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 3, 2023, 492 : 109 - 129
  • [37] Multiclass Classification of Dry Bean Grains Using Machine Learning Techniques
    Coronel-Reyes, Julian
    Delgado-Vera, Carlota
    Chavez-Urbina, Jenny
    Sinche-Guzman, Andrea
    TECHNOLOGIES AND INNOVATION, CITI 2024, 2025, 2276 : 16 - 27
  • [38] Binary and Multiclass Classification of Histopathological Images Using Machine Learning Techniques
    Wang, Jiatong
    Zhu, Tiantian
    Liang, Shan
    Karthiga, R.
    Narasimhan, K.
    Elamaran, V
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2020, 10 (09) : 2252 - 2258
  • [39] Classification of Firewall Log Data Using Multiclass Machine Learning Models
    Aljabri, Malak
    Alahmadi, Amal A.
    Mohammad, Rami Mustafa A.
    Aboulnour, Menna
    Alomari, Dorieh M.
    Almotiri, Sultan H.
    ELECTRONICS, 2022, 11 (12)
  • [40] Sugarcane leaf dataset: A dataset for disease detection and classification for machine learning applications
    Thite, Sandip
    Suryawanshi, Yogesh
    Patil, Kailas
    Chumchu, Prawit
    DATA IN BRIEF, 2024, 53