Unmanned aerial vehicles (UAVs) possess high mobility and a wide field of view, leading to challenges such as a high proportion of small objects, significant variation in object size, object aggregation, and complex backgrounds in aerial images. Existing object detection methods often overlook the texture information in high-level features, which is crucial for detecting small objects in complex backgrounds. To improve the detection performance of small objects in complex scenes, we propose an efficient feature aggregation network (EFA-Net) based on YOLOv7. The backbone of the network seamlessly integrates a lightweight hybrid feature extraction module (LHFE), which replaces traditional convolutions with depthwise convolutions and employs a hybrid channel attention mechanism to capture local and global information concurrently. This design can effectively reduce the parameters without sacrificing detection accuracy and enhance the network’s representative capacity. In the neck, we design an innovative adaptive multi-scale feature fusion module (AMSFM) that improves the model’s adaptability to small objects and complex backgrounds by fusing multi-scale features with high-level semantic information and capturing the texture information in high-level features. Additionally, we incorporate a residual spatial pyramid pooling (RSPP) module to strengthen information fusion from various receptive fields and reduce the interference of complex backgrounds on small object detection. To further improve the model’s robustness and generalization ability, we propose an enhanced complete intersection over union (ECIoU) loss function to balance the influence of large and small objects during training. Experimental results demonstrate the effectiveness of the proposed method, achieving mAP50\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${mAP_{50}}$$\end{document} scores of 51.6% and 48.5%, and mAP scores of 29.6% and 29.5% on the VisDrone 2019 and UAVDT datasets, respectively.