Radar-based continuous human activity recognition (CHAR) is of paramount importance in various applications, including security monitoring, smart home, and human-computer interaction. However, the utilization of traditional deep learning network models and single-domain data representation cannot effectively harness the information of radar data to enhance the classification performance. Therefore, this article proposes a radar-based CHAR method based on multidomain fusion vision transformer (ViT). The proposed approach leverages the sufficient feature information of different 2-D representations of radar data, namely, range-time maps (RTMs), range-Doppler maps (RDMs), and Doppler-time maps (DTMs). The multidomain fusion ViT networks, which incorporate the data-, feature-, and decision-level fusion methods, are designed and investigated for achieving the radar-based CHAR task. In particular, we replace the multilayer perceptron (MLP) with the locally enhanced feedforward network (LeFF) in the transformer encoder. The LeFF can promote the model's ability to capture the local context information with the low model complexity. The experimental results from publicly available datasets have demonstrated that the proposed multidomain fusion ViT network with decision-level fusion can improve the classification performance compared to the existing convolutional neural network (CNN) and recurrent neural network (RNN)-based classifiers.