The accurate diagnosis of Alzheimer’s disease (AD) in the early stages, such as significant memory concern (SMC) and mild cognitive impairment (MCI), is essential in order to slow its progression through timely treatment. Recent achievements have shown that fusing multimodal neuroimaging data effectively facilitates AD diagnosis. However, most proposed fusion methods simply add or concatenate multimodal features and do not make full use of nonlinear features and texture features across the range of modalities. This paper proposes a diagnostic model that effectively diagnoses AD in different stages by fusing functional magnetic resonance imaging (fMRI) and structural MRI (sMRI) information. First, fMRI and sMRI scans are preprocessed, and mean regional homogeneity (mReHo) transformation is performed for the preprocessed fMRI scans. Then, 3DMR-PCANet extracts features of mReHo images. The basic ResNet module is stacked to build a 3DResNet-10 model for feature extraction of sMRI scans. Next, two image features are fused by kernel canonical correlation analysis. Finally, a support vector machine (SVM) is utilized for the classification of fused features. Experimental results on the Alzheimer's Disease Neuroimaging dataset demonstrate the effectiveness of the proposed method. Specifically, this method improves on the accuracy, specificity, sensitivity, F1 value and area under the curve (AUC) of existing methods in comparisons of the normal control (NC) versus SMC, NC versus MCI, NC versus AD, SMC versus MCI, SMC versus AD, and MCI versus AD groups, which confirms that the proposed method can mine information on the correlation between fMRI and sMRI data of the same subject and can effectively classify AD patients in different stages.