共 50 条
Improving Chest X-Ray Report Generation by Leveraging Warm Starting
被引:24
|作者:
Nicolson, Aaron
[1
]
Dowling, Jason
[1
]
Koopman, Bevan
[1
]
机构:
[1] CSIRO Hlth & Biosecur, Australian eHlth Res Ctr, Brisbane, Australia
关键词:
Chest X-ray report generation;
Image captioning;
Multi-modal learning warm starting;
ARTIFICIAL-INTELLIGENCE;
RADIOLOGY;
D O I:
10.1016/j.artmed.2023.102633
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Automatically generating a report from a patient's Chest X-Rays (CXRs) is a promising solution to reducing clinical workload and improving patient care. However, current CXR report generators -- which are predominantly encoder-to-decoder models -- lack the diagnostic accuracy to be deployed in a clinical setting. To improve CXR report generation, we investigate warm starting the encoder and decoder with recent open-source computer vision and natural language processing checkpoints, such as the Vision Transformer (ViT) and PubMedBERT. To this end, each checkpoint is evaluated on the MIMIC-CXR and IU X-Ray datasets. Our experimental investigation demonstrates that the Convolutional vision Transformer (CvT) ImageNet-21K and the Distilled Generative Pre-trained Transformer 2 (DistilGPT2) checkpoints are best for warm starting the encoder and decoder, respectively. Compared to the state-of-the-art (M2 Transformer Progressive), CvT2DistilGPT2 attained an improvement of 8.3\% for CE F-1, 1.8\% for BLEU-4, 1.6\% for ROUGE-L, and 1.0\% for METEOR. The reports generated by CvT2DistilGPT2 have a higher similarity to radiologist reports than previous approaches. This indicates that leveraging warm starting improves CXR report generation. Code and checkpoints for CvT2DistilGPT2 are available at this https://github.com/achre/cvt2distiglgpt2.
引用
收藏
页数:17
相关论文