Accurately and efficiently mapping water bodies is of great significance for water resources monitoring, management, and application. It remains challenging to efficiently and accurately extract water bodies from remote sensing images due to the diversity of their shapes, sizes, distributions, and the complexity of the scenes. Although traditional methods can extract water bodies from remote sensing images, their extraction accuracy falls short of meeting the practical application requirements due to heterogeneous objects with the same spectrum. Therefore, there is an urgent demand for advanced high- performance techniques to improve the efficiency and accuracy of water body extraction. The combination of deep learning and remote sensing technology can fully exploit the advantages of deep learning and effectively contribute to the accurate extraction of water bodies. Current challenges in extracting water bodies through deep learning methods persist in issues such as multi-scale feature fusion, prolonged processing times, and the involvement of multiple parameters. The HarDNet-MSEG model possesses high segmentation accuracy and fast inference speed. In order to further fully utilize relevant information from the channel and spatial location levels, and to improve the segmentation accuracy of the model, this paper proposes a Hybrid Attention Mechanism (HAM) integrated into the HarDNet- MSEG network framework. The HAM is embedded into the HarDNet- MSEG network to explore its optimal position within the architecture, and a series of comparative experiments are conducted under the same experimental conditions, comparing it with other attention mechanisms, classical network algorithms, and traditional methods. Additionally, the generalizability of the model on other datasets is tested. The results show that the HAM module performs best at the shallower layers of the HarDNet- MSEG network. Compared with other attention mechanisms, the HAM module achieves higher performance, with MIoU, FWIoU, and PA reaching 94.0687%, 97.7374%, and 99.3205%, respectively. Compared with the classical models such as DeepLabV3+, U-Net, and PSPNet, the HarDNet-MSEGHAM1 model not only achieves the highest MIoU but also demonstrates excellent performance in terms of parameter count, calculation volume, and training time. The HarDNet- MSEG- HAM1 model demonstrates significant advantages over the traditional methods, while also exhibiting favorable performance on other datasets. Finally, four phases of lakes in the endorheic Qinghai-Tibet Plateau in 2013, 2016, 2019, and 2022 are successfully extracted, and their area changes are analyzed. A series of experimental data demonstrate the superiority and robustness of the proposed model in water body extraction tasks. This study aims to provide a methodological framework and relevant data for extracting water bodies from complex remote sensing images. © 2024 Science Press. All rights reserved.