The surge in remote sensing (RS) data underscores the need for improved data diversity and processing. While integrating hyperspectral (HS) and light detection and ranging (LiDAR) data enhances analysis and addresses spectral variability, the high dimensionality, noise, and outliers inherent in hyperspectral images present significant challenges. In addition, the precise labeling required for HS makes supervised classification labor-intensive, professional-focused, and time-consuming, further motivating the development of advanced HS clustering algorithms to address these issues. Unsupervised clustering addresses the above issues but still struggles due to the underutilization of auxiliary spatial and structural information, high data dimensionality with redundant hyperspectral bands, and information divergence from heterogeneity among multimodal data. These challenges impede the effective extraction of consistent structures, undermining clustering stability and overall model performance. To address these challenges, we propose a superpixel-based bipartite graph clustering (SBGC) enriched with spatial information for hyperspectral and LiDAR data models. Our proposed method fully utilizes spatial information to construct meaningful bipartite graphs for the efficient processing of multimodal RS data. By adopting a projected clustering paradigm, our approach simultaneously clusters and reduces dimensionality, effectively eliminating redundant bands. In addition, it innovatively stacks multimodal data into tensors, thoroughly exploring the consistent structures in the low-rank space among different modalities. This reduces the heterogeneity-induced information divergence and significantly enhances clustering performance. Extensive experiments on real datasets confirm the method's effectiveness and advanced capabilities.