Poisoning medical knowledge using large language models

被引:0
|
作者
Yang, Junwei [1 ]
Xu, Hanwen [2 ]
Mirzoyan, Srbuhi [1 ]
Chen, Tong [2 ]
Liu, Zixuan [2 ]
Liu, Zequn [1 ]
Ju, Wei [1 ]
Liu, Luchen [1 ]
Xiao, Zhiping [2 ]
Zhang, Ming [1 ]
Wang, Sheng [2 ]
机构
[1] Peking Univ, Sch Comp Sci, Anker Embodied AI Lab, State Key Lab Multimedia Informat Proc, Beijing, Peoples R China
[2] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
基金
中国国家自然科学基金;
关键词
D O I
10.1038/s42256-024-00899-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biomedical knowledge graphs (KGs) constructed from medical literature have been widely used to validate biomedical discoveries and generate new hypotheses. Recently, large language models (LLMs) have demonstrated a strong ability to generate human-like text data. Although most of these text data have been useful, LLM might also be used to generate malicious content. Here, we investigate whether it is possible that a malicious actor can use an LLM to generate a malicious paper that poisons medical KGs and further affects downstream biomedical applications. As a proof of concept, we develop Scorpius, a conditional text-generation model that generates a malicious paper abstract conditioned on a promoted drug and a target disease. The goal is to fool the medical KG constructed from a mixture of this malicious abstract and millions of real papers so that KG consumers will misidentify this promoted drug as relevant to the target disease. We evaluated Scorpius on a KG constructed from 3,818,528 papers and found that Scorpius can increase the relevance of 71.3% drug-disease pairs from the top 1,000 to the top ten by adding only one malicious abstract. Moreover, the generation of Scorpius achieves better perplexity than ChatGPT, suggesting that such malicious abstracts cannot be efficiently detected by humans. Collectively, Scorpius demonstrates the possibility of poisoning medical KGs and manipulating downstream applications using LLMs, indicating the importance of accountable and trustworthy medical knowledge discovery in the era of LLMs.
引用
收藏
页码:1156 / 1168
页数:13
相关论文
共 50 条
  • [31] Large language models in medical ethics: useful but not expert
    Ferrario, Andrea
    Biller-Andorno, Nikola
    JOURNAL OF MEDICAL ETHICS, 2024,
  • [32] Evaluating large language models on medical evidence summarization
    Tang, Liyan
    Sun, Zhaoyi
    Idnay, Betina
    Nestor, Jordan G.
    Soroush, Ali
    Elias, Pierre A.
    Xu, Ziyang
    Ding, Ying
    Durrett, Greg
    Rousseau, Justin F.
    Weng, Chunhua
    Peng, Yifan
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [33] Evaluating large language models on medical evidence summarization
    Liyan Tang
    Zhaoyi Sun
    Betina Idnay
    Jordan G. Nestor
    Ali Soroush
    Pierre A. Elias
    Ziyang Xu
    Ying Ding
    Greg Durrett
    Justin F. Rousseau
    Chunhua Weng
    Yifan Peng
    npj Digital Medicine, 6
  • [34] Ethics of large language models in medicine and medical research
    Li, Hanzhou
    Moon, John T.
    Purkayastha, Saptarshi
    Celi, Leo Anthony
    Trivedi, Hari
    Gichoya, Judy W.
    LANCET DIGITAL HEALTH, 2023, 5 (06): : E333 - E335
  • [35] Variability in Large Language Models' Responses to Medical Licensing and Certification Examinations. Comment on "How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment"
    Epstein, Richard H.
    Dexter, Franklin
    JMIR MEDICAL EDUCATION, 2023, 9
  • [36] Using large language models in psychology
    Demszky, Dorottya
    Yang, Diyi
    Yeager, David
    Bryan, Christopher
    Clapper, Margarett
    Chandhok, Susannah
    Eichstaedt, Johannes
    Hecht, Cameron
    Jamieson, Jeremy
    Johnson, Meghann
    Jones, Michaela
    Krettek-Cobb, Danielle
    Lai, Leslie
    Jonesmitchell, Nirel
    Ong, Desmond
    Dweck, Carol
    Gross, James
    Pennebaker, James
    NATURE REVIEWS PSYCHOLOGY, 2023, 2 (11): : 688 - 701
  • [37] Knowledge graph construction for heart failure using large language models with prompt engineering
    Xu, Tianhan
    Gu, Yixun
    Xue, Mantian
    Gu, Renjie
    Li, Bin
    Gu, Xiang
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18
  • [38] Using large language models in psychology
    Dorottya Demszky
    Diyi Yang
    David S. Yeager
    Christopher J. Bryan
    Margarett Clapper
    Susannah Chandhok
    Johannes C. Eichstaedt
    Cameron Hecht
    Jeremy Jamieson
    Meghann Johnson
    Michaela Jones
    Danielle Krettek-Cobb
    Leslie Lai
    Nirel JonesMitchell
    Desmond C. Ong
    Carol S. Dweck
    James J. Gross
    James W. Pennebaker
    Nature Reviews Psychology, 2023, 2 : 688 - 701
  • [40] Interactive computer-aided diagnosis on medical image using large language models
    Sheng Wang
    Zihao Zhao
    Xi Ouyang
    Tianming Liu
    Qian Wang
    Dinggang Shen
    Communications Engineering, 3 (1):