Facing with the challenges of the vehicle detection in remote sensing images, such as complex backgrounds, multi-scale differences, and difficulty in detecting small targets, a detection method GEM_YOLO based on bidirectional multi-scale feature fusion is proposed. There are three main parts in this method: the first one is a globally efficient attention module that is designed as a feature extractor to achieve lightweight and efficient feature extraction, in order to solve the problem of object detection in complex backgrounds. Secondly, a bidirectional multi-scale feature fusion network is proposed as a feature fusion device, which adopts top-down and bottom-up feature fusion strategies to effectively promote information exchange between features at different levels. Finally, the application of an attention based on the dynamic detection head as a predictor enhances the perception of different scales, spatial positions, and tasks, further improving the accuracy and robustness of object detection. Related experiments are conducted on public datasets DIOR and DOTA, whose average accuracy reaches 92.4% and 81.4% that is significantly superior to other mainstream detection methods. Meanwhile, the fewer parameters and computational complexity provide an efficient solution for vehicle detection within the domain of remote sensing image detection. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.