We propose a method that is able to use the monocular zoom technology for real-scale 3-D reconstruction of the scene. To reconstruct the scene, we take a sequence of zoomed-in and zoomed-out figures. First, we can estimate zoomed-in camera parameters using the known zoomed-out camera parameters, which avoids calibrating the camera parameters twice. Then, we use the structure from motion (SfM) method (COLMAP) to reconstruct free-scale translations among these figures. After that, as we have pairs of zoom frames in the same scene, we can calculate the true scale of the scene by comparing the ratio between the free-scale translation of a pair of zoom frames and the difference in zoomed-out and the zoomed-in focal length. Finally, we use RAFT-stereo to compute the depth of the scene. In detail, we select two adjacent figures taken at the same focal length, make a stereo correction for them, and remove the nonco-vision area of the corrected images. This way, we obtain a more accurate matching of these images and then get a dense real-scale 3-D reconstruction. Experimental results have demonstrated that our method achieves good performance on monocular 3-D reconstruction with the real scale.