As autonomous driving technology rapidly advances, the accurate detection and classification of traffic signs have become pivotal in ensuring driver safety and supporting the evolution of autonomous vehicles. Nonetheless, individual models possess inherent limitations. YOLOv8 is renowned for its swift detection capabilities and proficiency in identifying distant objects. However, due to the constraints imposed by its grid cell architecture, it exhibits suboptimal performance in detecting small, close-range targets. Conversely, Mask R-CNN demonstrates high precision in the detection of objects at close range, yet there remains a need for improvement in terms of distant object detection and the speed of detection. To overcome these obstacles, we introduce a novel model which integrates the strengths of Mask Region-based Convolutional Neural Network (Mask R-CNN) and the YOLOv8 model using a stacking ensemble technique. Our model was evaluated on the CCTSDB dataset and the MTSD dataset, demonstrating superior performance across various conditions. The experimental results on the MTSD dataset show a 3.63% improvement in mean Average Precision (mAP) and a 2.35% increase in Frames Per Second (FPS) compared to the Mask R-CNN, achieving a 3.20% increase in mAP over the YOLOv8. Moreover, the proposed model exhibited notable precision in challenging scenarios such as ultra-long-distance detections, shadow occlusions, motion blur, and complex environments with diverse sign categories. These findings not only showcase the model's robustness but also serve as a cornerstone in propelling the evolution of intelligent transportation systems and autonomous driving technology. © 2013 IEEE.