Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke

被引:9
|
作者
Rannikmae, Kristiina [1 ,2 ]
Wu, Honghan [2 ,3 ]
Tominey, Steven [4 ]
Whiteley, William [5 ,6 ]
Allen, Naomi [6 ,7 ]
Sudlow, Cathie [1 ,2 ,8 ]
机构
[1] Univ Edinburgh, Med Informat Ctr, NINE Edinburgh Bioquarter, 9 Little France Rd, Edinburgh EH16 4UX, Midlothian, Scotland
[2] Hlth Data Res UK, London, England
[3] UCL, Inst Hlth Informat, London, England
[4] Univ Edinburgh, Sch Med, Edinburgh, Midlothian, Scotland
[5] Univ Edinburgh, Ctr Clin Brain Sci, Edinburgh, Midlothian, Scotland
[6] Univ Oxford, Nuffield Dept Populat Hlth, Oxford, England
[7] UK Biobank, Stockport, Lancs, England
[8] BHF Data Sci Ctr, London, England
关键词
Natural language processing; Disease subtyping; Stroke; Cerebral hemorrhage; Brain scan; TEXT;
D O I
10.1186/s12911-021-01556-0
中图分类号
R-058 [];
学科分类号
摘要
Background Better phenotyping of routinely collected coded data would be useful for research and health improvement. For example, the precision of coded data for hemorrhagic stroke (intracerebral hemorrhage [ICH] and subarachnoid hemorrhage [SAH]) may be as poor as < 50%. This work aimed to investigate the feasibility and added value of automated methods applied to clinical radiology reports to improve stroke subtyping. Methods From a sub-population of 17,249 Scottish UK Biobank participants, we ascertained those with an incident stroke code in hospital, death record or primary care administrative data by September 2015, and >= 1 clinical brain scan report. We used a combination of natural language processing and clinical knowledge inference on brain scan reports to assign a stroke subtype (ischemic vs ICH vs SAH) for each participant and assessed performance by precision and recall at entity and patient levels. Results Of 225 participants with an incident stroke code, 207 had a relevant brain scan report and were included in this study. Entity level precision and recall ranged from 78 to 100%. Automated methods showed precision and recall at patient level that were very good for ICH (both 89%), good for SAH (both 82%), but, as expected, lower for ischemic stroke (73%, and 64%, respectively), suggesting coded data remains the preferred method for identifying the latter stroke subtype. Conclusions Our automated method applied to radiology reports provides a feasible, scalable and accurate solution to improve disease subtyping when used in conjunction with administrative coded health data. Future research should validate these findings in a different population setting.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Developing Automated Methods for Disease Subtyping in UK Biobank: An Exemplar Study on Stroke
    Rannikmae, Kristiina
    Wu Honghan
    Tominey, Steven
    Whiteley, William
    Allen, Naomi
    Sudlow, Cathie L.
    STROKE, 2021, 52
  • [2] Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke
    Kristiina Rannikmäe
    Honghan Wu
    Steven Tominey
    William Whiteley
    Naomi Allen
    Cathie Sudlow
    BMC Medical Informatics and Decision Making, 21
  • [3] Developing a measure of dietary quality for the UK Biobank study
    Montague, C.
    D'angelo, S.
    Harvey, N. C.
    Vogel, C.
    Baird, J.
    PROCEEDINGS OF THE NUTRITION SOCIETY, 2021, 80 (OCE5)
  • [4] Consumption of coffee and tea and risk of developing stroke, dementia, and poststroke dementia: A cohort study in the UK Biobank
    Zhang, Yuan
    Yang, Hongxi
    Li, Shu
    Li, Wei-Dong
    Wang, Yaogang
    PLOS MEDICINE, 2021, 18 (11)
  • [5] Association of Diligence and Sociability with Stroke: A UK Biobank Study on Personality Proxies
    de Ruijter, Markus J. T.
    Dahlen, Amelia D.
    Rukh, Gull
    Schioeth, Helgi B.
    Pawelec, Graham
    FRONTIERS IN BIOSCIENCE-LANDMARK, 2022, 27 (08):
  • [6] Association of sleep and rest-activity-rhythm with the risk of developing Parkinson's disease: A UK Biobank study
    Haghayegh, Shahab
    Gao, Lei
    Li, Peng
    Hu, Kun
    JOURNAL OF SLEEP RESEARCH, 2024, 33
  • [7] Social isolation and the risk of Parkinson disease in the UK biobank study
    Geng, Tingting
    Li, Yaqi
    Peng, Yinshun
    Chen, Xiao
    Xu, Xinming
    Wang, Jian
    Sun, Liang
    Gao, Xiang
    NPJ PARKINSONS DISEASE, 2024, 10 (01)
  • [8] Meat consumption and risk of ischemic heart disease and stroke: results from the UK Biobank
    Papier, Keren
    Fensom, Georgina
    Knuppel, Anika
    Key, Timothy
    Perez-Cornago, Aurora
    PROCEEDINGS OF THE NUTRITION SOCIETY, 2020, 79 (OCE2) : E454 - E454
  • [9] Burden of disease in asthma, stratified by eosinophils levels: a UK biobank study
    Proenca, Catia C.
    Martinot, Aurelie Chekroun
    Dumoulin, Oscar
    Rose, Mathieu
    EUROPEAN RESPIRATORY JOURNAL, 2023, 62
  • [10] Burden of disease in COPD, stratified by eosinophils levels: a UK biobank study
    Proenca, Catia C.
    Dumoulin, Oscar
    Martinot, Aurelie Chekroun
    Hurtado, Pepi
    Rose, Mathieu
    EUROPEAN RESPIRATORY JOURNAL, 2023, 62