Flood risk assessment is critical for mitigating economic losses and enhancing urban disaster resilience, especially as climate change and rapid urbanization increase flood vulnerability. However, traditional machine learning models often struggle to capture complex spatial patterns and nonlinear relationships, limiting their predictive accuracy. To address this challenge, this study introduces an innovative hybrid machine learning model, 2D-CNN-CapsNet-WOA, designed to enhance urban flood risk prediction. By integrating the strengths of convolutional neural networks (CNN), capsule networks (CapsNet), and the whale optimization algorithm (WOA), the proposed model effectively identifies high-risk areas and key influencing factors. The findings demonstrate that the model not only achieves high predictive performance but also uncovers critical insights into urban flood dynamics. High-risk zones were predominantly concentrated in central urban areas, such as Guangzhou, Shenzhen, and Foshan, reflecting their high exposure to flood hazards. In contrast, low-risk regions were observed in peripheral and mountainous areas like Zhaoqing, underscoring the spatial variability of flood risks. Key factors such as distance to hospitals (DTH) and distance to water bodies (DTW) emerged as primary drivers of flood risk, while natural factors such as the Sediment Transport Index (STI), Stream Power Index (SPI), and Topographic Wetness Index (TWI) had relatively lower impacts. This study contributes to advancing flood risk assessment by demonstrating the effectiveness of hybrid machine learning approaches in capturing spatial and contextual factors. Beyond the case study, the methodology provides a scalable and transferable framework for urban flood risk modeling, offering practical guidance for disaster management and urban planning in floodprone regions worldwide.