Feature selection for effective prediction of SARS-COV-2 using machine learning

被引:0
|
作者
Punacha, Gagan [1 ]
Adiga, Rama [1 ]
机构
[1] Nitte, Nitte Univ Ctr Sci Educ & Res NUCSER, Dept Mol Genet & Canc, Mangalore, Karnataka, India
关键词
Keywords; Machine learning; Surveillance; SARS-CoV-2; Clustering; CLASSIFICATION;
D O I
10.1007/s13258-023-01467-6
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundWith rise in variants of SARS-CoV-2, it is necessary to classify the emerging SARS-CoV-2 for early detection and thereby reduce human transmission. Genomic and proteomic information have less frequently been used for classifying in a machine learning (ML) approach for detection of SARS-CoV-2.ObjectiveWith this aim we used nucleoprotein and viral proteomic evolutionary information of SARS-CoV-2 along with the charge and basicity distribution of amino acids from various strains of SARS-CoV-2 to generate a disease severity model based on ML.MethodsAll sequence and clinical data were obtained from GISAID. Proteomic level calculations were added to comprise the dataset. The training set was used for feature selection. Select K- Best feature selection method was employed which was cross validated with testing set and performance evaluated. Delong's test was also done. We also employed BIRCH clustering on SARS-CoV-2 for clustering the strains.ResultsOut of six ML models four were successful in training and testing. Extra Trees algorithm generated a micro-averaged F1-score of 74.2% and a weighted averaged area under the receiver operating characteristic curve (AUC-ROC) score of 73.7% with multi-class option. The feature selection set to 5, enhanced the ROC AUC from 73.7 to 76.4%. Accuracy of the selected model of 86.9% was achieved.ConclusionThe unique features identified in the ML approach was able to classify disease severity into classes and had potential for predicting risk in newer variants.
引用
收藏
页码:95 / 112
页数:18
相关论文
共 50 条
  • [1] Feature selection for effective prediction of SARS-COV-2 using machine learning
    Gagan Punacha
    Rama Adiga
    [J]. Genes & Genomics, 2024, 46 : 341 - 354
  • [2] Prediction of antigenic peptides of SARS-CoV-2 pathogen using machine learning
    Bukhari, Syed Nisar Hussain
    Ogudo, Kingsley A.
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [3] Classification of SARS-CoV-2 and non-SARS-CoV-2 using machine learning algorithms
    Singh, Om Prakash
    Vallejo, Marta
    El-Badawy, Ismail M.
    Aysha, Ali
    Madhanagopal, Jagannathan
    Faudzi, Ahmad Athif Mohd
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 136
  • [4] Machine Learning for Prediction of Patients on Hemodialysis with an Undetected SARS-CoV-2 Infection
    Monaghan, Caitlin K.
    Larkin, John W.
    Chaudhuri, Sheetal
    Han, Hao
    Jiao, Yue
    Bermudez, Kristine M.
    Weinhandl, Eric D.
    Dahne-Steuber, Ines A.
    Belmonte, Kathleen
    Neri, Luca
    Kotanko, Peter
    Kooman, Jeroen P.
    Hymes, Jeffrey L.
    Kossmann, Robert J.
    Usvyat, Len A.
    Maddux, Franklin W.
    [J]. KIDNEY360, 2021, 2 (03): : 456 - 468
  • [5] Antiprotozoal peptide prediction using machine learning with effective feature selection techniques
    Periwal, Neha
    Arora, Pooja
    Thakur, Ananya
    Agrawal, Lakshay
    Goyal, Yash
    Rathore, Anand S.
    Anand, Harsimrat Singh
    Kaur, Baljeet
    Sood, Vikas
    [J]. HELIYON, 2024, 10 (16)
  • [6] Robust Representation and Efficient Feature Selection Allows for Effective Clustering of SARS-CoV-2 Variants
    Tayebi, Zahra
    Ali, Sarwan
    Patterson, Murray
    [J]. ALGORITHMS, 2021, 14 (12)
  • [7] Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph
    Du, Richard
    Tsougenis, Efstratios D.
    Ho, Joshua W. K.
    Chan, Joyce K. Y.
    Chiu, Keith W. H.
    Fang, Benjamin X. H.
    Ng, Ming Yen
    Leung, Siu-Ting
    Lo, Christine S. Y.
    Wong, Ho-Yuen F.
    Lam, Hiu-Yin S.
    Chiu, Long-Fung J.
    So, Tiffany Y.
    Wong, Ka Tak
    Wong, Yiu Chung, I
    Yu, Kevin
    Yeung, Yiu-Cheong
    Chik, Thomas
    Pang, Joanna W. K.
    Wai, Abraham Ka-chung
    Kuo, Michael D.
    Lam, Tina P. W.
    Khong, Pek-Lan
    Cheung, Ngai-Tseung
    Vardhanabhuti, Varut
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [8] Machine learning application for the prediction of SARS-CoV-2 infection using blood tests and chest radiograph
    Richard Du
    Efstratios D. Tsougenis
    Joshua W. K. Ho
    Joyce K. Y. Chan
    Keith W. H. Chiu
    Benjamin X. H. Fang
    Ming Yen Ng
    Siu-Ting Leung
    Christine S. Y. Lo
    Ho-Yuen F. Wong
    Hiu-Yin S. Lam
    Long-Fung J. Chiu
    Tiffany Y So
    Ka Tak Wong
    Yiu Chung I. Wong
    Kevin Yu
    Yiu-Cheong Yeung
    Thomas Chik
    Joanna W. K. Pang
    Abraham Ka-chung Wai
    Michael D. Kuo
    Tina P. W. Lam
    Pek-Lan Khong
    Ngai-Tseung Cheung
    Varut Vardhanabhuti
    [J]. Scientific Reports, 11
  • [9] Enhanced SARS-CoV-2 case prediction using public health data and machine learning models
    Price, Bradley S.
    Khodaverdi, Maryam
    Hendricks, Brian
    Smith, Gordon S.
    Kimble, Wes
    Halasz, Adam
    Guthrie, Sara
    Fraustino, Julia D.
    Hodder, Sally L.
    [J]. JAMIA OPEN, 2024, 7 (01)
  • [10] Machine learning prediction of 3CLpro SARS-CoV-2 docking scores
    Bucinsky, Lukas
    Bortnak, Dusan
    Gall, Marian
    Matuska, Jan
    Milata, Viktor
    Pitonak, Michal
    Stelac, Marek
    Vegh, Daniel
    Zajacek, David
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2022, 98