To cope with the challenge of blurred images of sea surface objects caused by the complex and undulating sea surface environment, we propose Sw-YoloX, which can utilize the global modeling ability to encode the key semantics of sea surface objects, thereby obtaining global features that cannot be captured by CNN. Then the convolutional block attention module (CBAM) and atrous spatial pyramid pooling (ASPP) module are integrated in the neck of the detector, and the decoupled head is used as the prediction part. In addition, we also integrate multiple training strategies to effectively improve the detector performance, such as simple optimal transport assignment (SimOTA) strategy and multi-model integration. Finally, we construct the XM-10000 dataset for validation based on sea surface monitoring data in Xiamen, China. With end-to-end training, Sw-YoloX achieves higher performance than baseline and mainstream detector, with F1-Score is 78.1, mean average precision (mAP) is 54.4, and average recall (AR) is 72.0. This research, which has now been deployed in the coastal defense department in Xiamen, China, has important implications for searching for survivors and preventing smuggling.