Fire is one of the most harmful hazards that affect daily life. The existing fire detection methods have the problems of large computation, slow detection speed, and low detection accuracy to varying degrees, and do not achieve a better trade-off between model complexity, accuracy, and detection speed. In this paper, a multiscale fire image detection method combining Convolutional Neural Network(CNN) and Transformer is proposed. In the shallow layer of the model, the CNN-based multiscale feature extraction module is used to obtain rich fire image information. In the deep layers of the model, the powerful global learning ability of the Transformer is used to carry out overall perception and macroscopic understanding of images. The experimental results show that the best detection accuracy of the model can reach 94.62%, and the fastest detection speed can reach 158.12FPS, F1 score is stable at around 94%, which is fully capable of real-time and accurate detection of fire. Compared with the existing detection methods, this method has higher detection accuracy under similar model complexity and detection speed. With similar detection accuracy, our method has a faster detection speed. The proposed method achieves a better balance between model complexity, detection speed, and accuracy.