Pulmonary mass segmentation;
Multi-modal MRI;
Spatial Transformer Networks (STN);
Joint training;
deep supervision;
NEURAL-NETWORK;
REGISTRATION;
D O I:
10.11999/JEIT210710
中图分类号:
TM [电工技术];
TN [电子技术、通信技术];
学科分类号:
0808 ;
0809 ;
摘要:
Most of the existing multi-modal segmentation methods are adopted on the co-registered multi-modal images. However, these two-stage algorithms of the segmentation and the registration achieve low segmentation performance on the modalities with remarkable spatial misalignment. To solve this problem, a cross-modal Spatial Alignment based Multi-Modal pulmonary mass Segmentation Network (MMSASegNet) with low model complexity and high segmentation accuracy is proposed. Dual-path Res-UNet is adopted as the backbone segmentation architecture of the proposed network for the better multi-modal feature extraction. Spatial Transformer Networks (STN) is applied to the segmentation masks from two paths to align the spatial information of mass region. In order to realize the multi-modal feature fusion based on the spatial alignment on the region of mass, the deformed mask and the reference mask are matrix-multiplied by the feature maps of each modality respectively. Further, the yielding cross-modality spatially aligned feature maps from multiple modalities are fused and learned through the feature fusion module for the multi-modal mass segmentation. In order to improve the performance of the end-to-end multi-modal segmentation network, deep supervision learning strategy is employed with the joint cost function constraining mass segmentation, mass spatial alignment and feature fusion. Moreover, the multi-stage training strategy is adopted to improve the training efficiency of each module. On the pulmonary mass datasets containing T2-Weighted-MRI (T2W) and Diffusion-Weighted-MRI Images (DWI), the proposed method achieved improvement on the metrics of Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD).