Applications of transformer-based language models in bioinformatics: a survey

被引:52
|
作者
Zhang, Shuang [1 ]
Fan, Rui [1 ]
Liu, Yuti [1 ]
Chen, Shuang [1 ]
Liu, Qiao
Zeng, Wanwen [1 ,2 ]
机构
[1] Nankai Univ, Coll Software, Tianjin 300350, Peoples R China
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
中国国家自然科学基金;
关键词
GENE-EXPRESSION DATA; PROTEINS;
D O I
10.1093/bioadv/vbad001
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
The transformer-based language models, including vanilla transformer, BERT and GPT-3, have achieved revolutionary breakthroughs in the field of natural language processing (NLP). Since there are inherent similarities between various biological sequences and natural languages, the remarkable interpretability and adaptability of these models have prompted a new wave of their application in bioinformatics research. To provide a timely and comprehensive review, we introduce key developments of transformer-based language models by describing the detailed structure of transformers and summarize their contribution to a wide range of bioinformatics research from basic sequence analysis to drug discovery. While transformer-based applications in bioinformatics are diverse and multifaceted, we identify and discuss the common challenges, including heterogeneity of training data, computational expense and model interpretability, and opportunities in the context of bioinformatics research. We hope that the broader community of NLP researchers, bioinformaticians and biologists will be brought together to foster future research and development in transformer-based language models, and inspire novel bioinformatics applications that are unattainable by traditional methods.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Bringing order into the realm of Transformer-based language models for artificial intelligence and law
    Greco, Candida M.
    Tagarelli, Andrea
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (04) : 863 - 1010
  • [42] Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
    Aspillaga, Carlos
    Carvallo, Andres
    Araujo, Vladimir
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1882 - 1894
  • [43] Classifying Drug Ratings Using User Reviews with Transformer-Based Language Models
    Shiju, Akhil
    He, Zhe
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 163 - 169
  • [44] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Feihong Yang
    Xuwen Wang
    Hetong Ma
    Jiao Li
    BMC Medical Informatics and Decision Making, 21
  • [45] Catching but a glimpse?-Navigating crowdsourced solution spaces with transformer-based language models
    Just, Julian
    Hutter, Katja
    Fueller, Johann
    CREATIVITY AND INNOVATION MANAGEMENT, 2024, 33 (04) : 718 - 741
  • [46] No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
    Kaddour, Jean
    Key, Oscar
    Nawrot, Piotr
    Minervini, Pasquale
    Kusner, Matt J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [47] Transformers-sklearn: a toolkit for medical language understanding with transformer-based models
    Yang, Feihong
    Wang, Xuwen
    Ma, Hetong
    Li, Jiao
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (SUPPL 2)
  • [48] Large language models and their applications in bioinformatics
    Sarumi, Oluwafemi A.
    Heider, Dominik
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3498 - 3505
  • [49] Transformer-Based Music Language Modelling and Transcription
    Zonios, Christos
    Pavlopoulos, John
    Likas, Aristidis
    PROCEEDINGS OF THE 12TH HELLENIC CONFERENCE ON ARTIFICIAL INTELLIGENCE, SETN 2022, 2022,
  • [50] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284