With the increasing number of remote sensing (RS) data sources, the joint utilization of multimodal data in Earth observation tasks has become a crucial research topic. As a typical representative of RS data, hyperspectral images (HSIs) provide accurate spectral information, while rich elevation information can be obtained from light detection and ranging (LiDAR) data. However, due to the significant differences in multimodal heterogeneous features, how to efficiently fuse HSI and LiDAR data remains one of the challenges faced by existing research. In addition, the edge contour information of images is not fully considered by existing methods, which can easily lead to performance bottlenecks. Thus, a joint classification network of HSI and LiDAR data based on Mamba (HLMamba) is proposed. Specifically, a gradient joint algorithm (GJA) is first performed on LiDAR data to obtain the edge contour data of the land distribution. Subsequently, a multimodal feature extraction module (MFEM) was proposed to capture the semantic features of HSI, LiDAR, and edge contour data. Then, to efficiently fuse multimodal features, a novel deep learning (DL) framework called Mamba, was introduced, and a multimodal Mamba fusion module (MMFM) was constructed. By efficiently modeling the long-distance dependencies of multimodal sequences, the MMFM can better explore the internal features of multimodal data and the interrelationships between modalities, thereby enhancing fusion performance. Finally, to validate the effectiveness of HLMamba, a series of experiments were conducted on three common HSI and LiDAR datasets. The results indicate that HLMamba has superior classification performance compared to other state-of-the-art DL methods. The source code of the proposed method will be available publicly at https://github.com/Dilingliao/HLMamba.