Leveraging Text-to-Text Pretrained Language Models for Question Answering in Chemistry

被引:3
|
作者
Tran, Dan [1 ]
Pascazio, Laura [1 ]
Akroyd, Jethro [1 ,2 ,3 ]
Mosbach, Sebastian [1 ,2 ,3 ]
Kraft, Markus [1 ,2 ,3 ,4 ,5 ]
机构
[1] Cambridge Ctr Adv Res & Educ Singapore, CARES, Singapore 138602, Singapore
[2] Univ Cambridge, Dept Chem Engn & Biotechnol, Cambridge CB3 0AS, England
[3] CMCL Innovat, Cambridge CB3 0AX, England
[4] Nanyang Technol Univ, Sch Chem & Biomed Engn, Singapore 637459, Singapore
[5] Alan Turing Inst, London NW1 2DB, England
来源
ACS OMEGA | 2024年 / 9卷 / 12期
基金
英国工程与自然科学研究理事会; 新加坡国家研究基金会;
关键词
KNOWLEDGE;
D O I
10.1021/acsomega.3c08842
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In this study, we present a question answering (QA) system for chemistry, named Marie, with the use of a text-to-text pretrained language model to attain accurate data retrieval. The underlying data store is "The World Avatar" (TWA), a general world model consisting of a knowledge graph that evolves over time. TWA includes information about chemical species such as their chemical and physical properties, applications, and chemical classifications. Building upon our previous work on KGQA for chemistry, this advanced version of Marie leverages a fine-tuned Flan-T5 model to seamlessly translate natural language questions into SPARQL queries with no separate components for entity and relation linking. The developed QA system demonstrates competence in providing accurate results for complex queries that involve many relation hops as well as showcasing the ability to balance correctness and speed for real-world usage. This new approach offers significant advantages over the prior implementation that relied on knowledge graph embedding. Specifically, the updated system boasts high accuracy and great flexibility in accommodating changes and evolution of the data stored in the knowledge graph without necessitating retraining. Our evaluation results underscore the efficacy of the improved system, highlighting its superior accuracy and the ability in answering complex questions compared to its predecessor.
引用
收藏
页码:13883 / 13896
页数:14
相关论文
共 50 条
  • [1] ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation
    Long Phan
    Hieu Tran
    Hieu Nguyen
    Trinh, Trieu H.
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 136 - 142
  • [2] Enhance Text-to-Text Transfer Transformer with Generated Questions for Thai Question Answering
    Phakmongkol, Puri
    Vateekul, Peerapon
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [3] NegT5: A Cross-Task Text-to-Text Framework for Negation in Question Answering
    Jin, Tao
    Racharak, Teeradaj
    Minh Le Nguyen
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 272 - 285
  • [4] Pretrained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4492 - 4499
  • [5] FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models
    Chada, Rakesh
    Natarajan, Pradeep
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6081 - 6090
  • [6] mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
    Chi, Zewen
    Dong, Li
    Ma, Shuming
    Huang, Shaohan
    Mao, Xian-Ling
    Huang, Heyan
    Wei, Furu
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1671 - 1683
  • [7] LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models
    Bulat, Adrian
    Tzimiropoulos, Georgios
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23232 - 23241
  • [8] Progressive Generation of Long Text with Pretrained Language Models
    Tan, Bowen
    Yang, Zichao
    Al-Shedivat, Maruan
    Xing, Eric P.
    Hu, Zhiting
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 4313 - 4324
  • [9] A Text-to-Text Model for Multilingual Offensive Language Identification
    Ranasinghe, Tharindu
    Zampieri, Marcos
    [J]. 13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 375 - 384
  • [10] Distractor Generation Through Text-to-Text Transformer Models
    de-Fitero-Dominguez, David
    Garcia-Lopez, Eva
    Garcia-Cabot, Antonio
    del-Hoyo-Gabaldon, Jesus-Angel
    Moreno-Cediel, Antonio
    [J]. IEEE ACCESS, 2024, 12 : 25580 - 25589