This research investigates how well different deep learning architectures classify medical imaging data, with a particular emphasis on identifying schizophrenia. Six models were assessed: a 4D ResNet architecture, two CNNs, a CNN-LSTM hybrid, EfficientNetV2, and MobileNetV3. Using the COBRE dataset, the study used 5-fold cross-validation to assess these models' performance. Besides evaluating deep learning architectures, this work includes a pretreatment pipeline for fMRI data and exploratory data analysis. Data is arranged for effective administration, and dimensionality is reduced using methods like PCA. A test accuracy of 94.7% was accomplished by the first CNN model, and 99.75% by the enhanced CNN model. The CNN-LSTM hybrid, in particular, demonstrated remarkable performance with a lest accuracy of 99.74%. EfficientNetV2 and MobileNetV3, on the other hand, had accuracies that were 93.23% and 63.41%, respectively, lower. At 60.00% test accuracy, the 4D ResNet model produced the least desirable outcome. These results highlight how crucial it is to choose the right architectures for medical picture classification tasks, especially in light of resource constraints. Taking into consideration hardware limitations, CNN-based models, in particular the CNN-LSTM hybrid and the second CNN model, hold potential for additional research in this area.