Optimizing UK biobank cloud-based research analysis platform to fine-map coronary artery disease loci in whole genome sequencing data

被引:0
|
作者
Sng, Letitia M. F. [1 ]
Kaphle, Anubhav [2 ]
O'Brien, Mitchell J. [1 ]
Hosking, Brendan [1 ]
Reguant, Roc [1 ]
Verjans, Johan [3 ,4 ,5 ]
Jain, Yatish [2 ,6 ]
Twine, Natalie A. [1 ,6 ]
Bauer, Denis C. [6 ,7 ,8 ]
机构
[1] Commonwealth Sci & Ind Res Org CSIRO, Australian E Hlth Res Ctr, Westmead, NSW, Australia
[2] Commonwealth Sci & Ind Res Org CSIRO, Australian E Hlth Res Ctr, Melbourne, Vic, Australia
[3] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
[4] South Australian Hlth & Med Res Inst, Lifelong Hlth, Adelaide, SA, Australia
[5] Royal Adelaide Hosp, Cent Adelaide Hlth Network, Adelaide, SA, Australia
[6] Macquarie Univ, Fac Sci & Engn, Appl BioSci, Macquarie Pk, Sydney, NSW, Australia
[7] Commonwealth Sci & Ind Res Org CSIRO, Australian E Hlth Res Ctr, Adelaide, SA, Australia
[8] Univ Sydney, Sch Sch Med Sci, Dept Biomed Informat & Digital Hlth, Sydney, Australia
来源
SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期
关键词
Population-scale genetics; UK Biobank; DNAnexus; Cloud-computing; GWAS; Trusted research environments; CHROMOSOME; 9P21; SUSCEPTIBILITY;
D O I
10.1038/s41598-025-95286-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We conducted the first comprehensive association analysis of a coronary artery disease (CAD) cohort within the recently released UK Biobank (UKB) whole genome sequencing dataset. We employed fine mapping tool PolyFun and pinpoint rs10757274 as the most likely causal SNV within the 9p21.3 CAD risk locus. Notably, we show that machine-learning (ML) approaches, REGENIE and VariantSpark, exhibited greater sensitivity compared to traditional single-SNV logistic regression, uncovering rs28451064 a known risk locus in 21q22.11. Our findings underscore the utility of leveraging advanced computational techniques and cloud-based resources for mega-biobank analyses. Aligning with the paradigm shift of bringing compute to data, we demonstrate a 44% cost reduction and 94% speedup through compute architecture optimisation on UK Biobank's Research Analysis Platform using our RAPpoet approach. We discuss three considerations for researchers implementing novel workflows for datasets hosted on cloud-platforms, to pave the way for harnessing mega-biobank-sized data through scalable, cost-effective cloud computing solutions.
引用
收藏
页数:9
相关论文
empty
未找到相关数据