Homomorphic encryption (HE) is a type of cryptography that allows computations to be performed on encrypted data. The technique relies on learning with errors problem, where data is hidden under noise for security. To avoid excessive noise, bootstrapping is used to reset the noise level in the ciphertext, but it requires a large key and is computationally expensive. The fully homomorphic encryption over the torus (TFHE) scheme offers a faster and programmable bootstrapping (PBS) algorithm, which is crucial for many privacy-focused applications. Nonetheless, the current TFHE scheme does not support ciphertext packing, resulting in low-throughput performance. To the best of our knowledge, this is the first work that thoroughly analyzes TFHE bootstrapping, identifies the TFHE acceleration bottleneck in GPUs, and proposes a hardware TFHE accelerator to solve the bottleneck. We begin by identifying the TFHE acceleration bottleneck in GPUs due to the blind rotation fragmentation problem. This can be improved by increasing the batch size in PBS. We propose a two-level batching approach to enhance the batch size in PBS. To implement this solution efficiently, we introduce Strix, utilizing a streaming and fully pipelined architecture with specialized units to accelerate ciphertext processing in TFHE. Specifically, we propose a novel microarchitecture for decomposition in TFHE, suitable for processing streaming data at high throughput. We also employ a fully-pipelined FFT microarchitecture to address the memory access bottleneck and improve its performance through a folding scheme, achieving 2x throughput improvement and 1.7x area reduction. Strix achieves over 1, 067x and 37x higher throughput in running TFHE with PBS than the state-of-the-art implementation on CPU and GPU, respectively, surpassing the state-of-the-art TFHE accelerator, MATCHA, by 7.4x.