Improving genetic variant identification for quantitative traits using ensemble learning-based approaches

被引:0
|
作者
Sharma, Jyoti [1 ]
Jangale, Vaishnavi [1 ]
Shekhawat, Rajveer Singh [1 ]
Yadav, Pankaj [1 ,2 ]
机构
[1] Indian Inst Technol, Dept Biosci & Bioengn, Jodhpur 342030, Rajasthan, India
[2] Indian Inst Technol, Sch Artificial Intelligence & Data Sci, Jodhpur 342030, Rajasthan, India
来源
BMC GENOMICS | 2025年 / 26卷 / 01期
关键词
Genome-wide association studies; Machine learning; Feature selection; Elastic-net; Support vector regression; Functional enrichment; GENOME-WIDE ASSOCIATION; CHOLESTEROL; REGRESSION; SELECTION; HERITABILITY; LOCI; EXPRESSION; IDENTIFY; DATABASE; OBESITY;
D O I
10.1186/s12864-025-11443-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundGenome-wide association studies (GWAS) are rapidly advancing due to the improved resolution and completeness provided by Telomere-to-Telomere (T2T) and pangenome assemblies. While recent advancements in GWAS methods have primarily focused on identifying genetic variants associated with discrete phenotypes, approaches for quantitative traits (QTs) remain underdeveloped. This has often led to significant variants being overlooked due to biases from genotype multicollinearity and strict p-value thresholds.ResultsWe propose an enhanced ensemble learning approach for QT analysis that integrates regularized variant selection with machine learning-based association methods, validated through comprehensive biological enrichment analysis. We benchmarked four widely recognized single nucleotide polymorphism (SNP) feature selection methods-least absolute shrinkage and selection operator, ridge regression, elastic-net, and mutual information-alongside four association methods: linear regression, random forest, support vector regression (SVR), and XGBoost. Our approach is evaluated on simulated datasets and validated using a subset of the PennCATH real dataset, including imputed versions, focusing on low-density lipoprotein (LDL)-cholesterol levels as a QT. The combination of elastic-net with SVR outperformed other methods across all datasets. Functional annotation of top 100 SNPs identified through this superior ensemble method revealed their expression in tissues involved in LDL cholesterol regulation. We also confirmed the involvement of six known genes (APOB, TRAPPC9, RAB2A, CCL24, FCHO2, and EEPD1) in cholesterol-related pathways and identified potential drug targets, including APOB, PTK2B, and PTPN12.ConclusionsIn conclusion, our ensemble learning approach effectively identifies variants associated with QTs, and we expect its performance to improve further with the integration of T2T and pangenome references in future GWAS.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Study of Ensemble Learning-Based Fusion Prognostics
    Sun Jianzhong
    Zuo Hongfu
    Yang Haibin
    Pecht, Michael
    2010 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE, 2010, : 82 - +
  • [32] GENETIC DISTANCES BASED ON QUANTITATIVE TRAITS
    CAMUSSI, A
    OTTAVIANO, E
    CALINSKI, T
    KACZMAREK, Z
    GENETICS, 1985, 111 (04) : 945 - 962
  • [33] Ensemble learning-based multimodal data analysis improving the diagnostic accuracy of Alzheimer's disease
    Wu, Junjiang
    Zhang, Hengchao
    Zhu, Xiaolong
    Zhang, Yan
    Ding, Xuemei
    Yang, Hongqin
    OPTICS IN HEALTH CARE AND BIOMEDICAL OPTICS XIII, 2023, 12770
  • [34] EMPIRICAL COMPARISON AND ANALYSIS OF MACHINE LEARNING-BASED APPROACHES FOR DRUGGABLE PROTEIN IDENTIFICATION
    Shoombuatong, Watshara
    Schaduangrat, Nalini
    Nikom, Jaru
    EXCLI JOURNAL, 2023, 22 : 915 - 927
  • [35] Improving Deep Learning-Based UWB LOS/NLOS Identification with Transfer Learning: An Empirical Approach
    Park, JiWoong
    Nam, SungChan
    Choi, HongBeom
    Ko, YoungEun
    Ko, Young-Bae
    ELECTRONICS, 2020, 9 (10) : 1 - 13
  • [36] Genetic classification of various familial relationships using the stacking ensemble machine learning approaches
    Jeong, Su Jin
    Lee, Hyo-Jung
    Lee, Soong Deok
    Park, Ji Eun
    Lee, Jae Won
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2024, 31 (03) : 279 - 289
  • [37] Ensemble learning-based prediction of contentment score using social multimedia in education
    Kaur, Maninder
    Mehta, Himika
    Randhawa, Sukhchandan
    Sharma, Pradip Kumar
    Park, Jong Hyuk
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (26-27) : 34423 - 34440
  • [38] Ensemble learning-based intelligent fault diagnosis method using feature partitioning
    Zhu, Yongsheng
    Zhu, Xiaoran
    Wang, Jing
    JOURNAL OF VIBROENGINEERING, 2013, 15 (03) : 1378 - 1392
  • [39] Improving Adversarial Attacks with Ensemble-Based Approaches
    Ji, Yapeng
    Zhou, Guoxu
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT II, 2022, 13605 : 15 - 29
  • [40] Ensemble learning-based prediction of contentment score using social multimedia in education
    Maninder Kaur
    Himika Mehta
    Sukhchandan Randhawa
    Pradip Kumar Sharma
    Jong Hyuk Park
    Multimedia Tools and Applications, 2021, 80 : 34423 - 34440