In recent years, Generative Adversarial Networks (GANs) have been widely adopted for computer vision tasks such as generation/synthesis of massive images and 3D object modeling. The hardware acceleration of Transposed Convolution layers is especially essential since the Generative Model (Generator) as a critical component in GANs is computationally intensive in nature. In transposed Convolution, the zeros-inserting preprocessing causes sparsity of the feature maps and further results in many invalid operations. Most of the existing FPGA architectures cannot effectively tackle this issue. To address the challenges of implementing Transposed Convolution on FPGAs, we present an innovative dataflow design approach by applying the Winograd algorithm for fast processing with a high efficiency in terms of resource allocations. In addition, we propose an underlying Hardware Accelerator Architecture that features having PUs embedded in Parallel, Pipelined, and Buffered processing flow. In this paper, a parallelism-aware Memory Partition scheme is also exploded for bandwidth efficient data access. Implementations of several state-of-the-art GANs by our approach achieves an average performance of 639.2 GOPS on Xilinx ZCU102 FPGA device. In reference to an optimized conventional accelerator baseline, this work demonstrates an 8.6x (up to 11.7x) improvement in processing performance, compared to below 2.2x improvement by the other works in literature.