Coastal wetlands are of great importance in protecting biodiversity, mitigating climate change, and providing natural resources. Using deep learning methods for the classification and mapping of coastal wetlands with optical remote sensing data can effectively monitor changes in wetlands, playing a crucial role in their protection. However, most current wetland classification methods focus on single-temporal data, with relatively few studies addressing multi-temporal data. Therefore, for the wetland classification task in the Bohai Rim region of China, an improved Swin-MTNet model based on the state-of-the-art deep learning model Swin-UNet is proposed in this study to better capture temporal feature variations with multi-temporal Sentinel-2 imagery. The Swin-MTNet is compared with Swin-UNet and DeepLabV3+, and the results indicate that Swin-MTNet achieves overall accuracy improvements of 5.12% and 2.85% and Kappa coefficient improvements of 6.85% and 3.86% over Swin-UNet and DeepLabV3+, respectively, when utilizing multi-temporal data. The classification improvement for Spartina alterniflora is the most significant, with F1 scores increasing by 0.45 and 0.47 compared to Swin-UNet and DeepLabV3+, respectively. These results demonstrate that the proposed Swin-MTNet model can effectively leverage the temporal features of multi-temporal data, significantly improving the accuracy of coastal wetland classification.