A cross dataset meta-model for hepatitis C detection using multi-dimensional pre-clustering

被引：0

作者：

Sharma, Aryan ^{[1
]}

Khade, Tanmay ^{[1
]}

Satapathy, Shashank Mouli ^{[1
]}

机构：

[1] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India

来源：

SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期

关键词：

Hepatitis C; Clustering; K-centroid clustering; K-means clustering; K-modes clustering; Machine learning; Stacking meta-model; XGBoost; KNN; SVM; RF; VIRUS-INFECTION; CIRRHOSIS;

D O I：

10.1038/s41598-025-91298-0

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Hepatitis C is a liver infection triggered by the hepatitis C virus (HCV). The infection results in swelling and irritation of the liver, which is called inflammation. Prolonged untreated exposure to the virus can lead to chronic hepatitis C. This can result in serious health complications such as liver damage, hepatocellular carcinoma (HCC), and potentially death. Therefore, rapid diagnosis and prompt treatment of HCV is crucial. This study utilizes machine learning (ML) to precisely identify hepatitis C in patients by analyzing parameters obtained from a standard biochemistry test. A hybrid dataset was acquired by merging two commonly used datasets from individual sources. A portion of the dataset was used as a hold-out set to simulate real-world data. A multi-dimensional pre-clustering approach was used in this study in the form of k-means for binning and k-modes for categorical clustering. The pre-clustering approach was used to extract a new feature. This extracted feature column was added to the original dataset and was used to train a stacked meta-model. The model was compared against baseline models. The predictions were further elaborated using explainable artificial intelligence. The models used were XGBoost, K-nearest neighbor, support vector classifier, and random forest (RF). The baseline score obtained was 94.25% using RF, while the meta-model gave a score of 94.82%.

引用

页数：17

共 4 条

[1] Multi-dimensional analog electronic circuits for motion detection using biomedical vision model
Kawaguchi, M
Jimbo, T
Umeno, M
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 66 - 71
[2] Assessing Teacher's Performance Evaluation and Prediction Model Using Cloud Computing Over Multi-dimensional Dataset
Kavitha, K.
WIRELESS PERSONAL COMMUNICATIONS, 2021, 119 (04) : 3207 - 3221
[3] Assessing Teacher’s Performance Evaluation and Prediction Model Using Cloud Computing Over Multi-dimensional Dataset
K. Kavitha
Wireless Personal Communications, 2021, 119 : 3207 - 3221
[4] Cancer tissue detection using improved K-means initialization method for multi-dimensional microarray big data clustering
Pandey K.K.
Shukla D.
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9277 - 9303

← 1 →