Traffic density estimation can be used for controlling traffic light signals to provide effective traffic management. It can be done in two steps: vehicle recognition and counting. Deep learning (DL) technologies are being explored more and more as CNN grows in popularity. In this study, initially, data was collected from various open-source libraries that is FLIR, KITTI, and MB7500. Vehicles in the images are labelled in six different classes. To deal with an imbalanced dataset, data augmentation techniques were applied. Then, a model based on an ensemble of the faster region-based convolutional neural networks (Faster R-CNN) and Single-shot detector (SSD) were trained on finally processed datasets. The results of the proposed model were compared with base estimators of the FLIR dataset (Thermal and RGB images separately), MB7500, and KITTI dataset. Experimental results depict that the highest mAP obtained was 94% by the proposed Ensemble on FLIR thermal dataset which was 34% better than SSD and 6% from the Faster R-CNN model. Overall, the proposed ensemble achieves better and more promising results as compared to base estimators. Experimental results also show that detection with thermal images was better than visible images. In addition, three algorithms were compared for estimated density and the proposed model shows significant potential for traffic density estimation.