In the realm of ophthalmology, diabetic retinopathy (DR) represents a critical concern, arising from the detrimental effects of blood sugar fluctuations on retinal vessels, and frequently evades early detection due to the absence of initial symptoms. Addressing this challenge, the current study delineates a novel classification methodology designed to gauge the severity of DR, thereby laying the groundwork for an early warning system. Within this methodology, an assortment of five deep learning models-namely VGG16, VGG19, EfficientNetB5, EfficientNetB7, and EfficientNetV2S-underwent training and evaluation processes utilizing the Asia Pacific Tele-Ophthalmology Society (APTOS) 2019 dataset, characterized by its imbalanced nature. The study juxtaposes the VGG models, noted for their simplicity yet burdened with a higher parameter count, hence more computationally and memory-intensive, against the EfficientNet models, which are renowned for their efficiency achieved through optimal network scaling. This selection of two VGG variants and three EfficientNet models facilitated a comprehensive analysis of the effects of model complexity, parameter volume, and computational efficiency on the classification efficacy in DR. Additionally, the study employed ensemble techniques, encompassing both hard and soft voting methods along with stacked generalization, to enhance classification performance by counteracting the impact of dataset imbalance. The individual model performances revealed that the EfficientNetB5 model registered the lowest accuracy at 88.12%, while the EfficientNetB7 model attained the highest accuracy, standing at 94.07%. The ensemble approaches, incorporating both soft and hard voting techniques, demonstrated further improvement, achieving accuracy scores of 94.84%. However, it was the stacked generalization approach that emerged as the most effective, recording a remarkable accuracy of 95.55%. These findings corroborate that the ensemble models, through their collective strength, surpass the accuracy rates of individual models, thereby eclipsing the performance benchmarks set by existing literature in the field by effectively mitigating the influence of data imbalance on classification accuracy.