BioInstruct: instruction tuning of large language models for biomedical natural language processing

被引：1

作者：

Tran, Hieu ^{[1
]}

Yang, Zhichao ^{[1
]}

Yao, Zonghai ^{[1
]}

Yu, Hong ^{[1
,2
,3
,4
]}

机构：

[1] Univ Massachusetts, Manning Coll Informat & Comp Sci, Amherst, MA USA

[2] Univ Massachusetts, Med Sch, Dept Med, Worcester, MA 01655 USA

[3] Univ Massachusetts Lowell, Ctr Biomed & Hlth Res Data Sci, Miner Sch Comp & Informat Sci, Dandeneau Hall-3rd Floor, One Univ Ave, Lowell, MA 01854 USA

[4] Ctr Healthcare Org & Implementat Res, VA Bedford Hlth Care, Bedford, MA 01730 USA

来源：

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION | 2024年 / 31卷 / 09期

基金：

美国国家卫生研究院;

关键词：

instruction tuning; large language models; question answering; natural language inference; information extraction; text generation; multi-task learning;

D O I：

10.1093/jamia/ocae122

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Objectives To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.Materials and Methods We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.Results and Discussion Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.Conclusion The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

引用

页码：1821 / 1832

页数：12

共 50 条

[1] Fine-tuning large neural language models for biomedical natural language processing
Tinn, Robert
Cheng, Hao
Gu, Yu
Usuyama, Naoto
Liu, Xiaodong
Naumann, Tristan
Gao, Jianfeng
Poon, Hoifung
[J]. PATTERNS, 2023, 4 (04):
[2] Exploring the effectiveness of instruction tuning in biomedical language processing
Rohanian, Omid
Nouriborji, Mohammadmahdi
Kouchaki, Samaneh
Nooralahzadeh, Farhad
Clifton, Lei
Clifton, David A.
[J]. Artificial Intelligence in Medicine, 2024, 158
[3] OCTOPACK: INSTRUCTION TUNING CODE LARGE LANGUAGE MODELS
Muennighoff, Niklas
Liu, Qian
Zebaze, Armel
Zheng, Qinkai
Hui, Binyuan
Zhuo, Terry Yue
Singh, Swayam
Tang, Xiangru
von Werra, Leandro
Longpre, Shayne
[J]. arXiv, 2023,
[4] GraphGPT: Graph Instruction Tuning for Large Language Models
Tang, Jiabin
Yang, Yuhao
Wei, Wei
Shi, Lei
Su, Lixin
Cheng, Suqi
Yin, Dawei
Huang, Chao
[J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 491 - 500
[5] Natural language processing in the era of large language models
Zubiaga, Arkaitz
[J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 6
[6] Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
Luo, Gen
Zhou, Yiyi
Ren, Tianhe
Chen, Shengxin
Sun, Xiaoshuai
Ji, Rongrong
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] Biomedical Natural Language Processing
Hamon, Thierry
[J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2013, 54 (03): : 77 - 79
[8] Biomedical Natural Language Processing
Kim, Jin-Dong
[J]. COMPUTATIONAL LINGUISTICS, 2017, 43 (01) : 265 - 267
[9] Robustness of GPT Large Language Models on Natural Language Processing Tasks
Xuanting, Chen
Junjie, Ye
Can, Zu
Nuo, Xu
Tao, Gui
Qi, Zhang
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1128 - 1142
[10] ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing
Neumann, Mark
King, Daniel
Beltagy, Iz
Ammar, Waleed
[J]. SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 319 - 327

← 1 2 3 4 5 →