In real-time semantic segmentation for drone imagery, current lightweight algorithms suffer from the lack of integration of global and local information in the image, leading to missed detections and misclassifications in the classification categories. This paper proposes a method for the real-time semantic segmentation of drones that integrates multi-scale global context information. The principle utilizes a UNet structure, with the encoder employing a Resnet18 network to extract features. The decoder incorporates a global-local attention module, where the global branch compresses and extracts global information in both vertical and horizontal directions, and the local branch extracts local information through convolution, thereby enhancing the fusion of global and local information in the image. In the segmentation head, a shallow-feature fusion module is used to multi-scale integrate the various features extracted by the encoder, thereby strengthening the spatial information in the shallow features. The model was tested on the UAvid and UDD6 datasets, achieving accuracies of 68% mIoU (mean Intersection over Union) and 67% mIoU on the two datasets, respectively, 10% and 21.2% higher than the baseline model UNet. The real-time performance of the model reached 72.4 frames/s, which is 54.4 frames/s higher than the baseline model UNet. The experimental results demonstrate that the proposed model balances accuracy and real-time performance well.