Recent speech-based services such as voice assistants and cloud computing services have brought security concerns, since those services constantly send user's speech data to the server. Speech data contain user's sensitive biometric data and spoken words and can be misused if it is leaked into wrong hands. In this context, Signal Processing in Encrypted Domain (SPED) can be a solution by mixing up Homomorphic Encryption (HE) with signal processing. Using HE enables computing on user's encrypted data without decrypting it, thus providing security and privacy. In this paper, we present a simple, but fast homomorphic Quantized Fourier Transform (QFT) with efficient packing of speech signals. Our work is based on the Fully Homomorphic Encryption (FHE) scheme TFHE, which was proposed by Chillotti et al. We then present a thorough noise analysis of our QFT that helps to keep a reasonable noise level. Also, considering the TFHE's bootstrapping manner, we statistically analyze the boundary of the QFT coefficients, and present a simple criterion for scaling up the coefficients. Our criteria help keep the message precision as high as possible during the TFHE bootstrapping. We use our criteria to evaluate the magnitude of QFT with low latency, but with reasonable precision. Finally, we provide a proof-of-concept implementation of our QFT. With a ring dimension of 1024 and TFHE parameters that achieve 106 bits of security, we show that the QFT can be evaluated in 35 milliseconds for a single ciphertext of length 1024. This result is 74.6 times faster than in the previous work. We also built a homomorphic end-to-end speech processing framework that processes and classifies gender (resp. vowel) of encrypted speech data from the VoxCeleb (resp. PCVC) dataset. Our implementation classifies the gender (resp. vowel) with more than 83% (resp. 79%) accuracy with a minimum of 0.05 (resp. 0.56) seconds with multithreading.