Because of their excellent error-correcting performance, low-density parity-check (LDPC) codes have recently attracted a lot of attention. In this paper, we are interested in the practical LDPC code decoder hardware implementations. The direct fully parallel decoder implementation usually incurs too high hardware complexity for many real applications, thus partly parallel decoder design approaches that can achieve appropriate trade-offs between hardware complexity and decoding throughput are highly desirable. Applying a joint code and decoder design methodology, we develop a high-speed[inline-graphic not available: see fulltext]-regular LDPC code partly parallel decoder architecture based on which we implement a[inline-graphic not available: see fulltext]-bit, rate-[inline-graphic not available: see fulltext][inline-graphic not available: see fulltext]-regular LDPC code decoder on Xilinx FPGA device. This partly parallel decoder supports a maximum symbol throughput of[inline-graphic not available: see fulltext] Mbps and achieves BER[inline-graphic not available: see fulltext] at 2 dB over AWGN channel while performing maximum[inline-graphic not available: see fulltext] decoding iterations.