In the traditional monocular pose estimation algorithm,convolution network is often used to locate several landmarks in the image,and then the target pose is estimated based on 2D-3D matching technology. But the distribution of landmarks on the satellite is scattered and due to the limited receptive field of convolution network,the positioning accuracy of landmarks is low,which affects the accuracy of subsequent pose estimation. In addition,the above process requires manual marking of landmark position labels and target mask labels,which is costly. For solving the two problems mentioned above,self-attention mechanism is introduced into the convolution network,which endows it with global modeling ability and improves the positioning accuracy of landmarks. In addition,the point cloud of the target is reconstructed through space carving,and then the point cloud is re- projected back to the pixel plane to automatically obtain the required labels,which improves the practicability of the algorithm. Experiment shows that the proposed algorithm has landmark localization accuracy of 92%,translation error of 0. 236% and rotation error of 9. 86x10(-3) rad on SPEED dataset,which improves the accuracy and simplifies the complexity. It can be effectively applied to relative pose estimation between spacecrafts.