Improving genetic variant identification for quantitative traits using ensemble learning-based approaches

被引:0
|
作者
Sharma, Jyoti [1 ]
Jangale, Vaishnavi [1 ]
Shekhawat, Rajveer Singh [1 ]
Yadav, Pankaj [1 ,2 ]
机构
[1] Indian Inst Technol, Dept Biosci & Bioengn, Jodhpur 342030, Rajasthan, India
[2] Indian Inst Technol, Sch Artificial Intelligence & Data Sci, Jodhpur 342030, Rajasthan, India
来源
BMC GENOMICS | 2025年 / 26卷 / 01期
关键词
Genome-wide association studies; Machine learning; Feature selection; Elastic-net; Support vector regression; Functional enrichment; GENOME-WIDE ASSOCIATION; CHOLESTEROL; REGRESSION; SELECTION; HERITABILITY; LOCI; EXPRESSION; IDENTIFY; DATABASE; OBESITY;
D O I
10.1186/s12864-025-11443-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundGenome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds.ResultsWe propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12.ConclusionsIn conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Pulmonary Nodule Classification Using Feature and Ensemble Learning-Based Fusion Techniques
    Muzammil, Muhammad
    Ali, Imdad
    Haq, Ihsan Ul
    Khaliq, Amir A.
    Abdullah, Suheel
    IEEE ACCESS, 2021, 9 : 113415 - 113427
  • [42] Authorship identification using ensemble learning
    Abbasi, Ahmed
    Javed, Abdul Rehman
    Iqbal, Farkhund
    Jalil, Zunera
    Gadekallu, Thippa Reddy
    Kryvinska, Natalia
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [43] Authorship identification using ensemble learning
    Ahmed Abbasi
    Abdul Rehman Javed
    Farkhund Iqbal
    Zunera Jalil
    Thippa Reddy Gadekallu
    Natalia Kryvinska
    Scientific Reports, 12
  • [44] Concrete Spalling Identification and Fire Resistance Prediction for Fired RC Columns Using Machine Learning-Based Approaches
    Ho, Thuan N. -T.
    Nguyen, Trong-Phuoc
    Truong, Gia Toai
    FIRE TECHNOLOGY, 2024, 60 (03) : 1823 - 1866
  • [45] Identification of geographical origins of Radix Paeoniae Alba using hyperspectral imaging with deep learning-based fusion approaches
    Cai, Zeyi
    Huang, Zihong
    He, Mengyu
    Li, Cheng
    Qi, Hengnian
    Peng, Jiyu
    Zhou, Fei
    Zhang, Chu
    FOOD CHEMISTRY, 2023, 422
  • [46] Improving the coastal aquifers' vulnerability assessment using SCMAI ensemble of three machine learning approaches
    Bordbar, Mojgan
    Neshat, Aminreza
    Javadi, Saman
    Pradhan, Biswajeet
    Dixon, Barnali
    Paryani, Sina
    NATURAL HAZARDS, 2022, 110 (03) : 1799 - 1820
  • [47] Estimating and Testing Pleiotropy of Single Genetic Variant for Two Quantitative Traits
    Zhang, Qunyuan
    Feitosa, Mary
    Borecki, Ingrid B.
    GENETIC EPIDEMIOLOGY, 2014, 38 (06) : 523 - 530
  • [48] Precipitation nowcasting using ensemble learning approaches
    Shah, Nita H. H.
    Shukla, Bipasha Paul
    Priamvada, Anupam
    INTERNATIONAL JOURNAL OF GLOBAL WARMING, 2022, 28 (04) : 387 - 399
  • [49] Improving the coastal aquifers’ vulnerability assessment using SCMAI ensemble of three machine learning approaches
    Mojgan Bordbar
    Aminreza Neshat
    Saman Javadi
    Biswajeet Pradhan
    Barnali Dixon
    Sina Paryani
    Natural Hazards, 2022, 110 : 1799 - 1820
  • [50] Electric Vehicle User Behavior Prediction using Learning-based Approaches
    Khan, Sara
    Brandherm, Boris
    Swamy, Anilkumar
    2020 IEEE ELECTRIC POWER AND ENERGY CONFERENCE (EPEC), 2020,