Regarding the virtual reality (VR) environment, scanpath prediction is critical for saving rendering resources and guiding content design due to its omnidirectional characteristic. However, only a few scanpath prediction models have been proposed, compared with the blossoming of saliency map prediction models. Focusing on scanpath prediction in the omnidirectional image area, this paper introduces a novel model that transforms the predicted saliency map into a potential scanpath, named saliency map to scanpath transformer (SMST). The model comprises three parts, filtering, clustering, and routing. Given a predicted saliency map in the VR environment, we first filter out mediocre areas and obtain the filtered saliency map. We then acquire a fixation set by clustering the filtered saliency map based on saliency distribution and taking the centers of the resulting clusters as potential fixation points. The fixation points are then connected, with the weights of the connections defined by the cylindrical distance and the primary visual feature of the original image. Based on the network composed of fully connected fixation points, the scanpath prediction model is converted to a routing problem, which aims to find the optimal route that reaches all the fixation points once and only once. We then propose a scanpath prediction model using an ant colony optimization algorithm. We evaluate the proposed model on multiple kinds of predicted saliency maps, and the prediction performance is promising.