Benchmarking DNA large language models on quadruplexes

被引：0

作者：

Cherednichenko, Oleksandr ^{[1
]}

Herbert, Alan ^{[1
,2
]}

Poptsova, Maria ^{[1
]}

机构：

[1] HSE Univ, Int Lab Bioinformat, Moscow, Russia

[2] InsideOutBio, Charlestown, MA USA

来源：

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL | 2025年 / 27卷

关键词：

Foundation model; Large language model; DNABERT; HyenaDNA; MAMBA-DNA; Caduseus; Flipons; Non-B DNA; G-quadruplexes;

D O I：

10.1016/j.csbj.2025.03.007

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and statespace models (SSMs). In this study, we benchmarked three different types of LLM architectures for generating whole-genome maps of G-quadruplexes (GQ), a type of flipons, or non-B DNA structures, characterized by distinctive patterns and functional roles in diverse regulatory contexts. Although GQ forms from folding guanosine residues into tetrads, the computational task is challenging as the bases involved may be on different strands, separated by a large number of nucleotides, or made from RNA rather than DNA. All LLMs performed comparably well, with DNABERT-2 and HyenaDNA achieving superior results based on F1 and MCC. Analysis of whole-genome annotations revealed that HyenaDNA recovered more quadruplexes in distal enhancers and intronic regions. The models were better suited to detecting large GQ arrays that likely contribute to the nuclear condensates involved in gene transcription and chromosomal scaffolds. HyenaDNA and Caduceus formed a separate grouping in the generated de novo quadruplexes, while transformer-based models clustered together. Overall, our findings suggest that different types of LLMs complement each other. Genomic architectures with varying context lengths can detect distinct functional regulatory elements, underscoring the importance of selecting the appropriate model based on the specific genomic task. The code and data underlying this article are available at https://github.com/powidla/G4s-FMs

引用

页码：992 / 1000

页数：9

共 50 条

[1] Benchmarking medical large language models
Bakhshandeh, Sadra
NATURE REVIEWS BIOENGINEERING, 2023, 1 (08): : 543 - 543
[2] Benchmarking AutoGen with different large language models
Barbarroxa, Rafael
Ribeiro, Bruno
Gomes, Luis
Vale, Zita
2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 263 - 264
[3] Benchmarking Large Language Models for News Summarization
Zhang, Tianyi
Ladhak, Faisal
Durmus, Esin
Liang, Percy
Mckeown, Kathleen
Hashimoto, Tatsunori B.
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 39 - 57
[4] Benchmarking Large Language Models: Opportunities and Challenges
Hodak, Miro
Ellison, David
Van Buren, Chris
Jiang, Xiaotong
Dholakia, Ajay
PERFORMANCE EVALUATION AND BENCHMARKING, TPCTC 2023, 2024, 14247 : 77 - 89
[5] FELM: Benchmarking Factuality Evaluation of Large Language Models
Chen, Shiqi
Zhao, Yiran
Zhang, Jinghan
Chern, I-Chun
Gao, Siyang
Liu, Pengfei
He, Junxian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Benchmarking Biomedical Relation Knowledge in Large Language Models
Zhang, Fenghui
Yang, Kuo
Zhao, Chenqian
Li, Haixu
Dong, Xin
Tian, Haoyu
Zhou, Xuezhong
BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 482 - 495
[7] Benchmarking Cognitive Biases in Large Language Models as Evaluators
Koo, Ryan
Lee, Minhwa
Raheja, Vipul
Park, Jongin
Kim, Zae Myung
Kang, Dongyeop
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 517 - 545
[8] TOMBENCH: Benchmarking Theory of Mind in Large Language Models
Chen, Zhuang
Wu, Jincenzi
Zhou, Jinfeng
Wen, Bosi
Bi, Guanqun
Jiang, Gongyao
Cao, Yaru
Hu, Mengting
Lai, Yunghwei
Xiong, Zexuan
Huang, Minlie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15959 - 15983
[9] HiBenchLLM: Historical Inquiry Benchmarking for Large Language Models
Chartier, Mathieu
Dakkoune, Nabil
Bourgeois, Guillaume
Jean, Stephane
DATA & KNOWLEDGE ENGINEERING, 2025, 156
[10] LAraBench: Benchmarking Arabic AI with Large Language Models
Qatar Computing Research Institute, HBKU, Qatar
不详
arXiv, 1600,

← 1 2 3 4 5 →