Improving genetic variant identification for quantitative traits using ensemble learning-based approaches

被引:0
|
作者
Sharma, Jyoti [1 ]
Jangale, Vaishnavi [1 ]
Shekhawat, Rajveer Singh [1 ]
Yadav, Pankaj [1 ,2 ]
机构
[1] Indian Inst Technol, Dept Biosci & Bioengn, Jodhpur 342030, Rajasthan, India
[2] Indian Inst Technol, Sch Artificial Intelligence & Data Sci, Jodhpur 342030, Rajasthan, India
来源
BMC GENOMICS | 2025年 / 26卷 / 01期
关键词
Genome-wide association studies; Machine learning; Feature selection; Elastic-net; Support vector regression; Functional enrichment; GENOME-WIDE ASSOCIATION; CHOLESTEROL; REGRESSION; SELECTION; HERITABILITY; LOCI; EXPRESSION; IDENTIFY; DATABASE; OBESITY;
D O I
10.1186/s12864-025-11443-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundGenome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds.ResultsWe propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12.ConclusionsIn conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Improving Symptom-Based Medical Diagnosis Using Ensemble Learning Approaches
    Aissaoui Ferhi, Leila
    Ben Amar, Manel
    Masmoudi, Atef
    Choubani, Fethi
    Bouallegue, Ridha
    SYSTEMS RESEARCH AND BEHAVIORAL SCIENCE, 2025,
  • [2] Identification of smart jammers: Learning-based approaches using wavelet preprocessing
    Topal, Ozan Alp
    Gecgel, Selen
    Eksioglu, Ender Mete
    Kurt, Gunes Karabulut
    PHYSICAL COMMUNICATION, 2020, 39
  • [3] MSBooster: improving peptide identification rates using deep learning-based features
    Yang, Kevin L.
    Yu, Fengchao
    Teo, Guo Ci
    Li, Kai
    Demichev, Vadim
    Ralser, Markus
    Nesvizhskii, Alexey I.
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [4] MSBooster: improving peptide identification rates using deep learning-based features
    Kevin L. Yang
    Fengchao Yu
    Guo Ci Teo
    Kai Li
    Vadim Demichev
    Markus Ralser
    Alexey I. Nesvizhskii
    Nature Communications, 14
  • [5] Lung Cancer Classification using Reinforcement Learning-based Ensemble Learning
    Luo, Shengping
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 1112 - 1122
  • [6] Learning-Based Approaches to Current Identification from Magnetic Sensors
    Barmada, Sami
    Di Barba, Paolo
    Formisano, Alessandro
    Mognaschi, Maria Evelina
    Tucci, Mauro
    SENSORS, 2023, 23 (08)
  • [7] A Comprehensive Survey on Ensemble Learning-Based Intrusion Detection Approaches in Computer Networks
    Lucas, Thiago Jose
    de Figueiredo, Inae Soares
    Tojeiro, Carlos Alexandre Carvalho
    de Almeida, Alex Marino G.
    Scherer, Rafal
    Brega, Jose Remo F.
    Papa, Joao Paulo
    da Costa, Kelton Augusto Pontara
    IEEE ACCESS, 2023, 11 : 122638 - 122676
  • [8] Improving joint identification of groundwater contaminant source and non-Gaussian distributed conductivity field using a deep learning-based ensemble smoother
    He, Lei
    Cheng, Huan
    Nan, Zhengnian
    Gong, Yiqing
    Guo, Huifang
    Mao, Jingqiao
    Zhang, Jiangjiang
    JOURNAL OF HYDROLOGY, 2025, 658
  • [9] Wind Power Prediction Using Ensemble Learning-Based Models
    Lee, Junho
    Wang, Wu
    Harrou, Fouzi
    Sun, Ying
    IEEE ACCESS, 2020, 8 (08): : 61517 - 61527
  • [10] An Ensemble Learning-Based Vehicle Steering Detector Using Smartphones
    Ouyang, Zhenchao
    Niu, Jianwei
    Liu, Yu
    Liu, Xue
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (05) : 1964 - 1975