Generation and evaluation of artificial mental health records for Natural Language Processing

被引:30
|
作者
Ive, Julia [1 ]
Viani, Natalia [2 ]
Kam, Joyce [2 ]
Yin, Lucia [2 ]
Verma, Somain [2 ]
Puntis, Stephen [3 ]
Cardinal, Rudolf N. [4 ,5 ]
Roberts, Angus [2 ]
Stewart, Robert [2 ,6 ]
Velupillai, Sumithra [2 ]
机构
[1] Imperial Coll London, Dept Comp, London SW7 2AZ, England
[2] Kings Coll London, IoPPN, London SE5 8AF, England
[3] Univ Oxford, Warneford Hosp, Dept Psychiat, Oxford OX3 7JX, England
[4] Univ Cambridge, Dept Psychiat, Downing St, Cambridge CB2 3EB, England
[5] Cambridgeshire & Peterborough NHS Fdn, Cambridge Biomed Campus,Box 190, Cambridge CB2 0QQ, England
[6] South London & Maudsley NHS Fdn Trust, London SE5 8AZ, England
基金
英国医学研究理事会; 瑞典研究理事会; 美国国家卫生研究院; 英国工程与自然科学研究理事会; 英国科研创新办公室;
关键词
Intensive care units;
D O I
10.1038/s41746-020-0267-x
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.
引用
下载
收藏
页数:9
相关论文
共 50 条
  • [21] A series of natural language processing for predicting tumor response evaluation and survival curve from electronic health records
    Toshiki Takeuchi
    Hidehito Horinouchi
    Ken Takasawa
    Masami Mukai
    Ken Masuda
    Yuki Shinno
    Yusuke Okuma
    Tatsuya Yoshida
    Yasushi Goto
    Noboru Yamamoto
    Yuichiro Ohe
    Mototaka Miyake
    Hirokazu Watanabe
    Masahiko Kusumoto
    Takashi Aoki
    Kunihiro Nishimura
    Ryuji Hamamoto
    BMC Medical Informatics and Decision Making, 25 (1)
  • [22] Neural Natural Language Processing for unstructured data in electronic health records: A review
    Li, Irene
    Pan, Jessica
    Goldwasser, Jeremy
    Verma, Neha
    Wong, Wai Pan
    Nuzumlali, Muhammed Yavuz
    Rosand, Benjamin
    Li, Yixin
    Zhang, Matthew
    Chang, David
    Taylor, R. Andrew
    Krumholz, Harlan M.
    Radev, Dragomir
    COMPUTER SCIENCE REVIEW, 2022, 46
  • [23] Prediction and evaluation of combination pharmacotherapy using natural language processing, machine learning and patient electronic health records
    Ding, Pingjian
    Pan, Yiheng
    Wang, Quanqiu
    Xu, Rong
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 133
  • [24] Machine Learning and Natural Language Processing in Mental Health: Systematic Review
    Le Glaz, Aziliz
    Haralambous, Yannis
    Kim-Dufor, Deok-Hee
    Lenca, Philippe
    Billot, Romain
    Ryan, Taylor C.
    Marsh, Jonathan
    DeVylder, Jordan
    Walter, Michel
    Berrouiguet, Sofian
    Lemey, Christophe
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (05)
  • [25] Artificial life for natural language processing
    Bel-Enguix, G
    Jiménez-López, MD
    ADVANCES IN ARTIFICAL LIFE, PROCEEDINGS, 2005, 3630 : 765 - 774
  • [26] Investigating online activity in UK adolescent mental health patients: a feasibility study using a natural language processing approach for electronic health records
    Sedgwick, Rosemary
    Bittar, Andre
    Kalsi, Herkiran
    Barack, Tamara
    Downs, Johnny
    Dutta, Rina
    BMJ OPEN, 2023, 13 (05):
  • [27] Natural language processing for electronic health records in anaesthesiology: an introduction to clinicians with recommendations and pitfalls
    Bernstorff, Martin
    Vistisen, Simon Tilma
    Enevoldsen, Kenneth C.
    JOURNAL OF CLINICAL MONITORING AND COMPUTING, 2024, 38 (02) : 241 - 245
  • [28] Using Natural Language Processing to Identify Different Lens Pathology in Electronic Health Records
    Stein, Joshua d.
    Zhou, Yunshu
    Andrews, Chris a.
    Kim, Judy e.
    Addis, Victoria
    Bixler, Jill
    Grove, Nathan
    Mcmillan, Brian
    Munir, Saleha z.
    Pershing, Suzann
    Schultz, Jeffrey s.
    Stagg, Brian c.
    Wang, Sophia y.
    Woreta, Fasika
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2024, 262 : 153 - 160
  • [29] Development of a natural language processing algorithm to detect chronic cough in electronic health records
    Bali, Vishal
    Weaver, Jessica
    Turzhitsky, Vladimir
    Schelfhout, Jonathan
    Paudel, Misti L.
    Hulbert, Erin
    Peterson-Brandt, Jesse
    Currie, Anne-Marie Guerra
    Bakka, Dylan
    BMC PULMONARY MEDICINE, 2022, 22 (01)
  • [30] Cohort design and natural language processing to reduce bias in electronic health records research
    Khurshid, Shaan
    Reeder, Christopher
    Harrington, Lia X.
    Singh, Pulkit
    Sarma, Gopal
    Friedman, Samuel F.
    Di Achille, Paolo
    Diamant, Nathaniel
    Cunningham, Jonathan W.
    Turner, Ashby C.
    Lau, Emily S.
    Haimovich, Julian S.
    Al-Alusi, Mostafa A.
    Wang, Xin
    Klarqvist, Marcus D. R.
    Ashburner, Jeffrey M.
    Diedrich, Christian
    Ghadessi, Mercedeh
    Mielke, Johanna
    Eilken, Hanna M.
    McElhinney, Alice
    Derix, Andrea
    Atlas, Steven J.
    Ellinor, Patrick T.
    Philippakis, Anthony A.
    Anderson, Christopher D.
    Ho, Jennifer E.
    Batra, Puneet
    Lubitz, Steven A.
    NPJ DIGITAL MEDICINE, 2022, 5 (01)