Exploring the effectiveness of instruction tuning in biomedical language processing

被引:1
|
作者
Rohanian, Omid [1 ,2 ]
Nouriborji, Mohammadmahdi [2 ,3 ]
Kouchaki, Samaneh [4 ]
Nooralahzadeh, Farhad [5 ,6 ]
Clifton, Lei [7 ]
Clifton, David A. [1 ,8 ]
机构
[1] Univ Oxford, Dept Engn Sci, Oxford, England
[2] NLPie Res, Oxford, England
[3] Sharif Univ Technol, Tehran, Iran
[4] Univ Surrey, Dept Elect & Elect Engn, Guildford, England
[5] Univ Zurich, Zurich, Switzerland
[6] Univ Hosp Zurich, Zurich, Switzerland
[7] Univ Oxford, Nuffield Dept Populat Hlth, Oxford, England
[8] Oxford Suzhou Ctr Adv Res, Suzhou, Peoples R China
关键词
Instruction tuning; Biomedical NLP; Named entity recognition; Relation extraction; Medical NLI; Llama2-MedTuned;
D O I
10.1016/j.artmed.2024.103007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs), particularly those similar to ChatGPT, have significantly influenced the field of Natural Language Processing (NLP). While these models excel in general language tasks, their performance in domain-specific downstream tasks such as biomedical and clinical Named Entity Recognition (NER), Relation Extraction (RE), and Medical Natural Language Inference (NLI) is still evolving. In this context, our study investigates the potential of instruction tuning for biomedical language processing, applying this technique to two general LLMs of substantial scale. We present a comprehensive, instruction-based model trained on a dataset that consists of approximately 200,000 instruction-focused samples. This dataset represents a carefully curated compilation of existing data, meticulously adapted and reformatted to align with the specific requirements of our instruction-based tasks. This initiative represents an important step in utilising such models to achieve results on par with specialised encoder-only models like BioBERT and BioClinicalBERT for various classical biomedical NLP tasks. Our work includes an analysis of the dataset's composition and its impact on model performance, providing insights into the intricacies of instruction tuning. By sharing our codes, models, and the distinctively assembled instruction-based dataset, we seek to encourage ongoing research and development in this area.2
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Natural Language Query Processing Framework for Biomedical Literature
    De Maio, Carmen
    Fenza, Giuseppe
    Loia, Vincenzo
    Parente, Mimmo
    PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 1628 - 1635
  • [42] Biomedical natural language processing in 2024: The year of BioMedGen
    Demner-Fushman, Dina
    Ananiadou, Sophia
    Miwa, Makoto
    Roberts, Kirk
    Tsujii, Jun-Ichi
    BioNLP 2024 - 23rd Meeting of the ACL Special Interest Group on Biomedical Natural Language Processing, Proceedings of the Workshop and Shared Tasks, 2024,
  • [43] The Second Language Processing of Passives, Object Pronouns and Null Subjects: Processing Instruction Compared to Language Experience
    Lee, James F.
    HISPANIA-A JOURNAL DEVOTED TO THE TEACHING OF SPANISH AND PORTUGUESE, 2019, 102 (01): : 91 - 100
  • [44] A comparison of word embeddings for the biomedical natural language processing
    Wang, Yanshan
    Liu, Sijia
    Afzal, Naveed
    Rastegar-Mojarad, Majid
    Wang, Liwei
    Shen, Feichen
    Kingsbury, Paul
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 87 : 12 - 20
  • [45] Exploring the English language classrooms in Canar: Curriculum, instruction and learning
    Ortega-Auquilla, Diego P.
    Minchala-Buri, Olga E.
    CIENCIA UNEMI, 2019, 12 (30): : 57 - 73
  • [46] THE USE OF NATURAL-LANGUAGE PROCESSING IN COMPUTER-ASSISTED LANGUAGE INSTRUCTION
    BAILIN, A
    THOMSON, P
    COMPUTERS AND THE HUMANITIES, 1988, 22 (02): : 99 - 110
  • [47] Processing instruction and meaningful output-based instruction: Effects on second language development
    Morgan-Short, K
    Bowden, HW
    STUDIES IN SECOND LANGUAGE ACQUISITION, 2006, 28 (01) : 31 - 65
  • [48] The Effectiveness of Second Language Pronunciation Instruction: A Meta-Analysis
    Lee, Junkyu
    Jang, Juhyun
    Plonsky, Luke
    APPLIED LINGUISTICS, 2015, 36 (03) : 345 - 366
  • [49] Effectiveness of second language collocation instruction: a meta-analysis
    Li, Xin
    Lei, Lei
    IRAL-INTERNATIONAL REVIEW OF APPLIED LINGUISTICS IN LANGUAGE TEACHING, 2024, 62 (02): : 377 - 404