This study proposes an unsupervised monocular depth estimation model for autonomous drone flight to overcome the limitations of high cost and large size in binocular depth estimation and a large number of depth maps required for training in supervised learning. The model first processes the input image into a pyramid shape to reduce the impact of different target sizes on the depth estimation. In addition, the neural network of the automatic encoder used for image reconstruction is designed based on ResNet-50, which is capable of feature extraction. The corresponding right or left pyramid images arc subsequently reconstructed by the bilinear sampling method based on the left or right input images, and corresponding pyramid disparity map is generated. Finally, the training loss could be assessed as the combination of the disparity smoothness loss, image reconstruction loss based on the structural similarity, and the loss of disparity consistency. Experimental results indicate that the model is more accurate and timely on KITT1 and Make3D compared with other monocular depth estimation methods. When trained on KITT1, the model essentially meets the accuracy requirements and real-time necessities for autonomous drone flight depth estimation.