Improving genetic variant identification for quantitative traits using ensemble learning-based approaches

被引：0

作者：

Sharma, Jyoti ^{[1
]}

Jangale, Vaishnavi ^{[1
]}

Shekhawat, Rajveer Singh ^{[1
]}

Yadav, Pankaj ^{[1
,2
]}

机构：

[1] Indian Inst Technol, Dept Biosci & Bioengn, Jodhpur 342030, Rajasthan, India

[2] Indian Inst Technol, Sch Artificial Intelligence & Data Sci, Jodhpur 342030, Rajasthan, India

来源：

BMC GENOMICS | 2025年 / 26卷 / 01期

关键词：

Genome-wide association studies; Machine learning; Feature selection; Elastic-net; Support vector regression; Functional enrichment; GENOME-WIDE ASSOCIATION; CHOLESTEROL; REGRESSION; SELECTION; HERITABILITY; LOCI; EXPRESSION; IDENTIFY; DATABASE; OBESITY;

D O I：

10.1186/s12864-025-11443-x

中图分类号：

Q81 [生物工程学（生物技术）]; Q93 [微生物学];

学科分类号：

071005 ; 0836 ; 090102 ; 100705 ;

摘要：

BackgroundGenome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds.ResultsWe propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12.ConclusionsIn conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.

引用

页数：17

共 50 条

[1] Improving Symptom-Based Medical Diagnosis Using Ensemble Learning Approaches
Aissaoui Ferhi, Leila
Ben Amar, Manel
Masmoudi, Atef
Choubani, Fethi
Bouallegue, Ridha
SYSTEMS RESEARCH AND BEHAVIORAL SCIENCE, 2025,
[2] Identification of smart jammers: Learning-based approaches using wavelet preprocessing
Topal, Ozan Alp
Gecgel, Selen
Eksioglu, Ender Mete
Kurt, Gunes Karabulut
PHYSICAL COMMUNICATION, 2020, 39
[3] MSBooster: improving peptide identification rates using deep learning-based features
Yang, Kevin L.
Yu, Fengchao
Teo, Guo Ci
Li, Kai
Demichev, Vadim
Ralser, Markus
Nesvizhskii, Alexey I.
NATURE COMMUNICATIONS, 2023, 14 (01)
[4] MSBooster: improving peptide identification rates using deep learning-based features
Kevin L. Yang
Fengchao Yu
Guo Ci Teo
Kai Li
Vadim Demichev
Markus Ralser
Alexey I. Nesvizhskii
Nature Communications, 14
[5] Lung Cancer Classification using Reinforcement Learning-based Ensemble Learning
Luo, Shengping
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 1112 - 1122
[6] Learning-Based Approaches to Current Identification from Magnetic Sensors
Barmada, Sami
Di Barba, Paolo
Formisano, Alessandro
Mognaschi, Maria Evelina
Tucci, Mauro
SENSORS, 2023, 23 (08)
[7] A Comprehensive Survey on Ensemble Learning-Based Intrusion Detection Approaches in Computer Networks
Lucas, Thiago Jose
de Figueiredo, Inae Soares
Tojeiro, Carlos Alexandre Carvalho
de Almeida, Alex Marino G.
Scherer, Rafal
Brega, Jose Remo F.
Papa, Joao Paulo
da Costa, Kelton Augusto Pontara
IEEE ACCESS, 2023, 11 : 122638 - 122676
[8] Improving joint identification of groundwater contaminant source and non-Gaussian distributed conductivity field using a deep learning-based ensemble smoother
He, Lei
Cheng, Huan
Nan, Zhengnian
Gong, Yiqing
Guo, Huifang
Mao, Jingqiao
Zhang, Jiangjiang
JOURNAL OF HYDROLOGY, 2025, 658
[9] Wind Power Prediction Using Ensemble Learning-Based Models
Lee, Junho
Wang, Wu
Harrou, Fouzi
Sun, Ying
IEEE ACCESS, 2020, 8 (08): : 61517 - 61527
[10] An Ensemble Learning-Based Vehicle Steering Detector Using Smartphones
Ouyang, Zhenchao
Niu, Jianwei
Liu, Yu
Liu, Xue
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (05) : 1964 - 1975

← 1 2 3 4 5 →