In past studies, atrous convolution is efficient in segmentation tasks to reinforce the receptive field and detection tasks. In addition, the attention module is efficient for feature extraction and enhancement. In this paper, we introduce atrous convolution, design a feature enhancement module, and utilize a plug-and-play technique, i.e., (AFE) module. Atrous convolution has been shown to be essential for expanding the perceptual field in past studies. We achieve this by fusing multiple layers of features of atrous convolution and adding a detection head to cope with the problem of varying object size scales. We achieve the purpose of extracting multi-scale contextual feature information while using an attention mechanism to effectively enhance the features and improve the overall multi-scale detection performance of the model. It can be added to a well-established backbone network or neck network. Therefore, based on this, we designed the C3 based on the atrous convolution (C3AT) module on the AFE module, replaced the C3 module in YOLOv5, and proposed the Multi-Scale Contextual Feature Enhancement Network (MCANet) as the neck network to obtain the final network structure. Experimental results indicate that the proposed method significantly improves inference speed and AP compared to the benchmark model. Single-model object detection results on the VisDrone2021 test set-dev dataset achieved 32.7% AP and 52.2%AP(50), a significant improvement of 8.1% AP and 11.4%AP(50) compared with the baseline model. The single-model object detection results on the VOC2007 test dataset reached 89.6% mAP.