The detection of road potholes plays a crucial role in ensuring passenger comfort and the structural safety of vehicles. To address the challenges of pothole detection in complex road environments, this paper proposes a model focusing on shape features (pothole detection you only look once, PD-YOLO). The model aims to overcome the limitations of multi-scale feature learning caused by the use of fixed convolutional kernels in the baseline model, by constructing a feature extraction module that better adapts to variations in the shape of potholes. Subsequently, a cross-stage partial network was designed using a one-time aggregation method, simplifying the model while enabling the network to fuse information between feature maps at different stages. Additionally, a dynamic sparse attention mechanism is introduced to select relevant features, reducing redundancy and suppressing background noise. Experiments conducted on the VOC2007 and GRDDC2020_Pothole datasets reveal that compared to the baseline model YOLOv8, PD-YOLO achieves improvements of 3.9% and 2.8% in mean average precision, with a frame rate of approximately 290 frames per second, effectively meeting the accuracy and real-time requirements for pothole detection. The code and dataset for this paper are located at: .