Transportation mode recognition is an important and challenging problem in intelligent transportation systems. For decades, many data mining and deep learning methods have been proposed, but all these methods have some deficiencies and do not fully use temporal and spatial information hidden in raw GPS trajectory data. In this paper, we presented a new deep learning model called TSANET, which designs an attention mechanism to combine global and local spatiotemporal features. The input of the model is seven kinematic features, derived from the GPS trajectories and covers all information hidden in the GPS points. The model adopts TCN and ST-Block to extract global and local spatiotemporal features, respectively. In ST-Block, BiGRU is used to capture temporal features and DenseNet is applied to capture spatial features. In TCN, convolution operations are applied to obtain temporal and spatial information. Moreover, a dual-layer attention mechanism is developed to integrate and reconstruct these features by assigning different weights to global and local spatiotemporal features caught by TCN and ST-Block. By comprehensively considering global and local features, the model can greatly improve classification performance and recognition granularity, and with the further introduction of the attention mechanism, it can identify all transportation modes in a fine-grained manner. At last, the experiments have been conducted on the Geo1 and Geo2 datasets used by many researchers. The experimental results show that our model cannot only achieve the highest accuracy of 94.34% among all recent methods but also identify all seven transportation modes clearly, thus verifying the advantages and effectiveness of the model.