The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was proposed as an alternative to Falcon, allowing parallel execution of most of its operations. These recent advancements motivate us to develop high throughput implementations of Falcon and Mitaka signature schemes on Graphics Processing Units (GPUs), a massively parallel architecture widely available on cloud service platforms. In this article, we propose the first parallel implementation of Falcon on various GPUs. We develop an iterative version of the sampling process in Falcon, which is also the most time-consuming Falcon operation. This allows us to implement Falcon signature generation without relying on expensive recursive function calls on GPUs. In addition, we propose a parallel random samples generation approach to accelerate the performance of Mitaka on GPUs. We evaluate our implementation techniques on state-of-the-art GPU architectures (RTX 3080, A100, T4 and V100). Experimental results show that our Falcon-512 implementation achieves 58,595 signatures/second and 2,721,562 verifications/second on an A100 GPU, which is 20.03x and 29.51x faster than the highly optimized AVX2 implementation on CPU. Our Mitaka implementation achieves 161,985 signatures/second and 1,421,046 verifications/second on the same GPU. Due to the adoption of a parallelizable sampling process, Mitaka signature generation enjoys approximate to 2 - 20x higher throughput than Falcon on various GPUs. The high throughput signature generation and verification achieved by this work can ...