With the development of sensors, the application of multi-source remote sensing data has been widely concerned. Since hyperspectral image (HSI) contains rich spectral information while light detection and ranging (LiDAR) data contains elevation information, joint use of them for ground object classification can yield positive results, especially by building deep networks. Fortunately, multi-scale deep networks allow to expand the receptive fields of convolution without causing the computational and training problems associated with simply adding more network layers. In this work, a multi-scale feature fusion network is proposed for the joint classification of HSI and LiDAR data. First, we design a multi-scale spatial feature extraction module with cross-channel connections, by which spatial information of HSI data and elevation information of LiDAR data are extracted and fused. In addition, a multi-scale spectral feature extraction module is employed to extract the multi-scale spectral features of HSI data. Finally, joint multi-scale features are obtained by weighting and concatenation operations and then fed into the classifier. To verify the effectiveness of the proposed network, experiments are carried out on the MUUFL Gulfport and Trento datasets. The experimental results demonstrate that the classification performance of the proposed method is superior to that of other state-of-the-art methods. © 2023 Beijing Institute of Technology. All rights reserved.