Benchmarking DNA large language models on quadruplexes

被引:0
|
作者
Cherednichenko, Oleksandr [1 ]
Herbert, Alan [1 ,2 ]
Poptsova, Maria [1 ]
机构
[1] HSE Univ, Int Lab Bioinformat, Moscow, Russia
[2] InsideOutBio, Charlestown, MA USA
关键词
Foundation model; Large language model; DNABERT; HyenaDNA; MAMBA-DNA; Caduseus; Flipons; Non-B DNA; G-quadruplexes;
D O I
10.1016/j.csbj.2025.03.007
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large language models (LLMs) in genomics have successfully predicted various functional genomic elements. While their performance is typically evaluated using genomic benchmark datasets, it remains unclear which LLM is best suited for specific downstream tasks, particularly for generating whole-genome annotations. Current LLMs in genomics fall into three main categories: transformer-based models, long convolution-based models, and statespace models (SSMs). In this study, we benchmarked three different types of LLM architectures for generating whole-genome maps of G-quadruplexes (GQ), a type of flipons, or non-B DNA structures, characterized by distinctive patterns and functional roles in diverse regulatory contexts. Although GQ forms from folding guanosine residues into tetrads, the computational task is challenging as the bases involved may be on different strands, separated by a large number of nucleotides, or made from RNA rather than DNA. All LLMs performed comparably well, with DNABERT-2 and HyenaDNA achieving superior results based on F1 and MCC. Analysis of whole-genome annotations revealed that HyenaDNA recovered more quadruplexes in distal enhancers and intronic regions. The models were better suited to detecting large GQ arrays that likely contribute to the nuclear condensates involved in gene transcription and chromosomal scaffolds. HyenaDNA and Caduceus formed a separate grouping in the generated de novo quadruplexes, while transformer-based models clustered together. Overall, our findings suggest that different types of LLMs complement each other. Genomic architectures with varying context lengths can detect distinct functional regulatory elements, underscoring the importance of selecting the appropriate model based on the specific genomic task. The code and data underlying this article are available at https://github.com/powidla/G4s-FMs
引用
收藏
页码:992 / 1000
页数:9
相关论文
共 50 条
  • [1] Benchmarking medical large language models
    Bakhshandeh, Sadra
    NATURE REVIEWS BIOENGINEERING, 2023, 1 (08): : 543 - 543
  • [2] Benchmarking AutoGen with different large language models
    Barbarroxa, Rafael
    Ribeiro, Bruno
    Gomes, Luis
    Vale, Zita
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 263 - 264
  • [3] Benchmarking Large Language Models for News Summarization
    Zhang, Tianyi
    Ladhak, Faisal
    Durmus, Esin
    Liang, Percy
    Mckeown, Kathleen
    Hashimoto, Tatsunori B.
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 39 - 57
  • [4] Benchmarking Large Language Models: Opportunities and Challenges
    Hodak, Miro
    Ellison, David
    Van Buren, Chris
    Jiang, Xiaotong
    Dholakia, Ajay
    PERFORMANCE EVALUATION AND BENCHMARKING, TPCTC 2023, 2024, 14247 : 77 - 89
  • [5] FELM: Benchmarking Factuality Evaluation of Large Language Models
    Chen, Shiqi
    Zhao, Yiran
    Zhang, Jinghan
    Chern, I-Chun
    Gao, Siyang
    Liu, Pengfei
    He, Junxian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Benchmarking Biomedical Relation Knowledge in Large Language Models
    Zhang, Fenghui
    Yang, Kuo
    Zhao, Chenqian
    Li, Haixu
    Dong, Xin
    Tian, Haoyu
    Zhou, Xuezhong
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT II, ISBRA 2024, 2024, 14955 : 482 - 495
  • [7] Benchmarking Cognitive Biases in Large Language Models as Evaluators
    Koo, Ryan
    Lee, Minhwa
    Raheja, Vipul
    Park, Jongin
    Kim, Zae Myung
    Kang, Dongyeop
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 517 - 545
  • [8] TOMBENCH: Benchmarking Theory of Mind in Large Language Models
    Chen, Zhuang
    Wu, Jincenzi
    Zhou, Jinfeng
    Wen, Bosi
    Bi, Guanqun
    Jiang, Gongyao
    Cao, Yaru
    Hu, Mengting
    Lai, Yunghwei
    Xiong, Zexuan
    Huang, Minlie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 15959 - 15983
  • [9] HiBenchLLM: Historical Inquiry Benchmarking for Large Language Models
    Chartier, Mathieu
    Dakkoune, Nabil
    Bourgeois, Guillaume
    Jean, Stephane
    DATA & KNOWLEDGE ENGINEERING, 2025, 156
  • [10] LAraBench: Benchmarking Arabic AI with Large Language Models
    Qatar Computing Research Institute, HBKU, Qatar
    不详
    arXiv, 1600,