As an important component of urban infrastructure, sewer system has a significant influence on the attainment of all sustainable development goals. Groundwater infiltration (GWI) into sewers imposes a hydraulic burden on wastewater collection networks, which eventually decreases the overall effectiveness of wastewater treatment. To tackle this challenge, it is crucial to develop an efficient and accurate approach for identifying the sources and measuring the infiltration volume. Therefore, this paper introduces a two-stage simulation-based inverse optimization model (SIOM). At the regional scale, an initial clustering analysis is conducted on the influencing indicators related to local spatial dependence in pipe network degradation. Then, the spatially clustering effect of GWI is encapsulated into the inverse optimization procedure, which is predicated on the segmental-level modeling. GWI sources and flows can be more precisely delineated and elucidated using a cluster-based genetic algorithm (CGA). The spatial statistical approach of Geographically Weighted Regression Model (GWR) is leveraged to determine the influence of explanatory factors on increased infiltration propensity in sewers based on spatial heterogeneity. In our case study, GWI contributed approximately 36 % of the total dry-weather inflow (34,373 m3/d) to the sewer system. CGA leads to 25 % and 7.6 % improvements in the convergence speed and prediction accuracy respectively. Meanwhile, the application of the membership function characterized by Gaussian distribution with a lower mean value enables the model to achieve optimal performance, with a NashSutcliffe Efficiency (NSE) value of 0.779. Explanatory factors such as pipeline diameter, slope, burial depth, road density, and building density show obvious spatial heterogeneity and have varying effects on the infiltration tendency, among which pipe diameter shows the most significant local effect. In the investigation of GWI within large-scale sewer systems, this method exhibits superior performance over traditional CCTV and other direct measurement techniques in terms of computational efficiency and modeling accuracy.