The data deluge in medical imaging processing requires faster and more efficient systems. Due to the advance in recent heterogeneous architecture, there has been a resurgence in research aimed at domain-specific accelerators. In this article, we develop an experimental system SuperDragon for evaluating acceleration of a single-particle Cryo-electron microscopy (Cryo-EM) 3D reconstruction package EMAN through a hybrid of CPU, GPU, and FPGA parallel architecture. Based on a comprehensive workload characterization, we exploit multigrained parallelism in the Cryo-EM 3D reconstruction algorithm and investigate a proper computational mapping to the underlying heterogeneous architecture. The package is restructured with task-level (MPI), thread-level (OpenMP), and data-level (GPU and FPGA) parallelism. Especially, the proposed FPGA accelerator is a stream architecture that emphasizes the importance of optimizing computing dominated data access patterns. Besides, the configurable computing streams are constructed by arranging the hardware modules and bypassing channels to form a linear deep pipeline. Compared to the multicore (six-core) program, the GPU and FPGA implementations achieve speedups of 8.4 and 2.25 times in execution time while improving power efficiency by factors of 7.2 and 14.2, respectively.