This paper presents a unified parallel radix-16 turbo decoder ASIC for 3GPP-LTE and WiMAX systems. A radix-16 decoding for both binary and duo-binary turbo codes is proposed to reduce complexity as well as critical path delay. In addition, the two distinct interleavers in the standards are implemented with low-complexity address generator and barrel shift networks. Furthermore, quad-bank memory partition facilitates parallel radix-16 decoding without address conflict. Fabricated in TSMC 65nm CMOS process, the ASIC attains 691Mbps throughput running at 512MHz and 5.5 iterations. For the 326.4Mbps LTE peak data rate, it consumes only 193mW at 0.9V supply voltage with unprecedented energy efficiency of 0.108nJ/bit/iteration.