MuSiC-ViT: A multi-task Siamese convolutional vision transformer for differentiating change from no-change in follow-up chest radiographs

被引：2

作者：

Cho, Kyungjin ^{[1
]}

Kim, Jeeyoung ^{[1
]}

Kim, Ki Duk ^{[2
]}

Park, Seungju ^{[3
]}

Kim, Junsik ^{[1
]}

Yun, Jihye ^{[4
]}

Ahn, Yura ^{[5
]}

Oh, Sang Young ^{[4
]}

Lee, Sang Min ^{[6
]}

Seo, Joon Beom ^{[4
]}

Kim, Namkug ^{[7
]}

机构：

[1] Univ Ulsan, Asan Med Inst Convergence Sci & Technol, Coll Med, Asan Med Ctr,Dept Biomed Engn, Seoul, South Korea

[2] Univ Ulsan, Asan Med Ctr, Dept Convergence Med, Coll Med, Seoul, South Korea

[3] Korea Univ, Coll Hlth Sci, Dept Biomed Engn, Seoul, South Korea

[4] Univ Ulsan, Asan Med Ctr, Dept Radiol, Coll Med, Seoul, South Korea

[5] Univ Ulsan, Asan Med Ctr, Dept Radiol & Res Radiol, Coll Med, Seoul, South Korea

[6] Univ Ulsan, Asan Med Ctr, Dept Radiol, Coll Med, Seoul, South Korea

[7] Univ Ulsan, Asan Med Ctr, Dept Convergence Med, Coll Med, Seoul, South Korea

来源：

MEDICAL IMAGE ANALYSIS | 2023年 / 89卷

关键词：

CNNs meet vision transformers; Follow-up chest radiographs; Multi-task learning; Vision transformer; Siamese network; X-RAY; SEGMENTATION; COVID-19; CLASSIFICATION;

D O I：

10.1016/j.media.2023.102894

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A major responsibility of radiologists in routine clinical practice is to read follow-up chest radiographs (CXRs) to identify changes in a patient's condition. Diagnosing meaningful changes in follow-up CXRs is challenging because radiologists must differentiate disease changes from natural or benign variations. Here, we suggest using a multi-task Siamese convolutional vision transformer (MuSiC-ViT) with an anatomy-matching module (AMM) to mimic the radiologist's cognitive process for differentiating baseline change from no-change. MuSiC-ViT uses the convolutional neural networks (CNNs) meet vision transformers model that combines CNN and transformer architecture. It has three major components: a Siamese network architecture, an AMM, and multi-task learning. Because the input is a pair of CXRs, a Siamese network was adopted for the encoder. The AMM is an attention module that focuses on related regions in the CXR pairs. To mimic a radiologist's cognitive process, MuSiC-ViT was trained using multi-task learning, normal/abnormal and change/no-change classification, and anatomymatching. Among 406 K CXRs studied, 88 K change and 115 K no-change pairs were acquired for the training dataset. The internal validation dataset consisted of 1,620 pairs. To demonstrate the robustness of MuSiC-ViT, we verified the results with two other validation datasets. MuSiC-ViT respectively achieved accuracies and area under the receiver operating characteristic curves of 0.728 and 0.797 on the internal validation dataset, 0.614 and 0.784 on the first external validation dataset, and 0.745 and 0.858 on a second temporally separated validation dataset. All code is available at https://github.com/chokyungjin/MuSiC-ViT.

引用

页数：12