Unveiling the power of language models in chemical research question answering

被引:0
|
作者
Xiuying Chen [1 ]
Tairan Wang [2 ]
Taicheng Guo [2 ]
Kehan Guo [3 ]
Juexiao Zhou [3 ]
Haoyang Li [2 ]
Zirui Song [2 ]
Xin Gao [1 ]
Xiangliang Zhang [2 ]
机构
[1] Mohamed bin Zayed University of Artificial Intelligence,
[2] King Abdullah University of Science and Technology,undefined
[3] University of Notre Dame,undefined
关键词
D O I
10.1038/s42004-024-01394-x
中图分类号
学科分类号
摘要
While the abilities of language models are thoroughly evaluated in areas like general domains and biomedicine, academic chemistry remains less explored. Chemical QA tools also play a crucial role in both education and research by effectively translating complex chemical information into an understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. Specifically, the questions are from paper titles with a question mark, and the multi-choice answers are reasoned out based on the corresponding abstracts. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a ChemMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. Experiments show that Large Language Models (LLMs) still have significant room for improvement in the field of chemistry. Moreover, ChemMatch significantly outperforms recent similar-scale baselines: https://github.com/iriscxy/chemmatch.
引用
收藏
相关论文
共 50 条
  • [21] QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering
    Yasunaga, Michihiro
    Ren, Hongyu
    Bosselut, Antoine
    Liang, Percy
    Leskovec, Jure
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 535 - 546
  • [22] Research on Question Classification for Automatic Question Answering
    Xu, Shihua
    Cheng, Gang
    Kong, Fang
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 218 - 221
  • [23] A medical question answering system using large language models and knowledge graphs
    Guo, Quan
    Cao, Shuai
    Yi, Zhang
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (11) : 8548 - 8564
  • [24] JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering
    Sun, Yueqing
    Shi, Qi
    Qi, Le
    Zhang, Yu
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5049 - 5060
  • [25] Incorporating Domain Knowledge and Semantic Information into Language Models for Commonsense Question Answering
    Zhou, Ruiying
    Tian, Keke
    Lai, Hanjiang
    Yin, Jian
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 1160 - 1165
  • [26] Toward expert-level medical question answering with large language models
    Karan Singhal
    Tao Tu
    Juraj Gottweis
    Rory Sayres
    Ellery Wulczyn
    Mohamed Amin
    Le Hou
    Kevin Clark
    Stephen R. Pfohl
    Heather Cole-Lewis
    Darlene Neal
    Qazi Mamunur Rashid
    Mike Schaekermann
    Amy Wang
    Dev Dash
    Jonathan H. Chen
    Nigam H. Shah
    Sami Lachgar
    Philip Andrew Mansfield
    Sushant Prakash
    Bradley Green
    Ewa Dominowska
    Blaise Agüera y Arcas
    Nenad Tomašev
    Yun Liu
    Renee Wong
    Christopher Semturs
    S. Sara Mahdavi
    Joelle K. Barral
    Dale R. Webster
    Greg S. Corrado
    Yossi Matias
    Shekoofeh Azizi
    Alan Karthikesalingam
    Vivek Natarajan
    Nature Medicine, 2025, 31 (3) : 943 - 950
  • [27] Leveraging Text-to-Text Pretrained Language Models for Question Answering in Chemistry
    Tran, Dan
    Pascazio, Laura
    Akroyd, Jethro
    Mosbach, Sebastian
    Kraft, Markus
    ACS OMEGA, 2024, 9 (12): : 13883 - 13896
  • [28] UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models
    Li, Xiaoxi
    Zhou, Yujia
    Dou, Zhicheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8688 - 8696
  • [29] Open-Domain Question Answering over Tables with Large Language Models
    Liang, Xinyi
    Hu, Rui
    Liu, Yu
    Zhu, Konglin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 347 - 358
  • [30] Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering
    Chaturvedi, Akshay
    Bhar, Swarnadeep
    Saha, Soumadeep
    Garain, Utpal
    Asher, Nicholas
    COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 119 - 155