Endoscopic image is a manifestation of visualization technology to the human gastrointestinal tract, allowing detection of abnormalities, characterization of lesions, and guidance for therapeutic interventions. Accurate and reliable classification of endoscopy images remains challenging due to variations in image quality, diverse anatomical structures, and subtle abnormalities such as polyps and ulcers. Convolutional Neural Network (CNN) is widely used in modern medical imaging, especially for abnormality classification tasks. However, relying on a single CNN classifier limits the model's ability to capture endoscopy images' full complexity and variability. A potential solution to the problem involves employing ensemble learning, which combines multiple models to reach at a final decision. Nevertheless, this learning approach presents several challenges, notably a significant risk of data bias. This issue arises from the unequal influence of weak and strong learners in most ensemble strategies, such as standard voting, which usually depend on certain assumptions, including equal performance among the models. However, it reduces the capability towards diverse model collaboration. Therefore, this paper proposes two solutions to the problems. Firstly, we create a diverse pool of CNNs with end-to-end approach. This approach promotes model diversity and enhances confidence in making a final decision. Secondly, we propose employing Particle Swarm Optimization to enhance the weight of the members in the ensemble learner in order to create a more resilient and accurate model compared to the standard ensemble learning approach. The experiment demonstrates that the proposed ensemble model outperforms the baseline model on both the Kvasir 1 and Kvasir 2 datasets, highlighting the effectiveness of the suggested approach in integrating diverse information from the baseline model. This enhanced performance highlights the efficacy of capturing diverse information from the baseline model.