A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering

被引:0
|
作者
Sharma, Aryan [1 ]
Khade, Tanmay [1 ]
Satapathy, Shashank Mouli [1 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
Hepatitis C; Clustering; K-centroid clustering; K-means clustering; K-modes clustering; Machine learning; Stacking meta-model; XGBoost; KNN; SVM; RF; VIRUS-INFECTION; CIRRHOSIS;
D O I
10.1038/s41598-025-91298-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Hepatitis C is a liver infection triggered by the hepatitis C virus (HCV). The infection results in swelling and irritation of the liver, which is called inflammation. Prolonged untreated exposure to the virus can lead to chronic hepatitis C. This can result in serious health complications such as liver damage, hepatocellular carcinoma (HCC), and potentially death. Therefore, rapid diagnosis and prompt treatment of HCV is crucial. This study utilizes machine learning (ML) to precisely identify hepatitis C in patients by analyzing parameters obtained from a standard biochemistry test. A hybrid dataset was acquired by merging two commonly used datasets from individual sources. A portion of the dataset was used as a hold-out set to simulate real-world data. A multi-dimensional pre-clustering approach was used in this study in the form of k-means for binning and k-modes for categorical clustering. The pre-clustering approach was used to extract a new feature. This extracted feature column was added to the original dataset and was used to train a stacked meta-model. The model was compared against baseline models. The predictions were further elaborated using explainable artificial intelligence. The models used were XGBoost, K-nearest neighbor, support vector classifier, and random forest (RF). The baseline score obtained was 94.25% using RF, while the meta-model gave a score of 94.82%.
引用
收藏
页数:17
相关论文
共 4 条