At present, the cross- view matching technology of remote sensing images cannot directly use large-scale satellite images for matching, which is difficult to meet the requirements of large- scale complex scene matching tasks and relies on large-scale datasets, thus lacking a good generalization ability. Aiming at the above problems, this paper proposes a cross- view remote sensing image matching method based on visual transformation using the quality- aware template matching method combined with the multi- scale feature fusion algorithm. In this method, the ground multi- view images are collected by using handheld photographic equipment. The portability and flexibility of the handheld photographic equipment can make it easier for us to collect multi- view images covering the target area. The acquired images are densely matched to generate point cloud data, and principal component analysis is used to fit the best ground plane and perform projection transformation to realize the conversion from the ground side view to the aerial view. Then, a feature fusion module is designed for the VGG19 network. The low, medium and high- level features extracted from remote sensing images are fused to obtain rich spatial and semantic information of remote sensing images. The fusion features of semantic information and spatial information can resist large- scale differences. Finally, the quality-aware template matching method is used. The features extracted from the ground images are matched with the fusion features of the remote sensing images. The matching soft ranking results are obtained, and the non-maximum suppression algorithm is used to select high- quality matching results. The experimental results show that the method proposed in this paper has a high accuracy and strong generalization ability without the need of large- scale datasets. The average matching success rate is 64.6%, and the average center point offset is 5.9 pixels. The matching results are accurate and complete, which provide a new solution for the task of cross-view image matching in large scenes. © 2023 Journal of Geo-Information Science. All rights reserved.