This paper describes a reconfigurable 4-way SIMD engine fabricated in 45 nm high-k/metal-gate CMOS, targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors. The SIMD accelerator is reconfigured to perform 4-way 16b x 16b multiplies, 32b x 32b multiply, 4-way 16b additions, 2-way 32b additions or 72b addition with single-cycle throughput and wide supply voltage range of operation (1.3 V-230 mV). A reconfigurable 2 x 2 tile of signed 2's complement 16b multipliers, with conditional carry gating in the 72b sparse tree adder, dual-supplies for voltage hopping, and fine-grained power-gating enables peak energy efficiency of 494GOPS/W (measured at 300 mV, 50 degrees C) with a dense layout occupying 0.081 mm(2) while achieving: (i) scalable performance up to 2.8 GHz, 278 mW measured at 1.3 V; (ii) fast single-cycle switching between any operating/idle mode; (iii) configuration-dependent power reduction of up to 41% in total power and 6.5 x in active leakage power; (iv) 10 x standby leakage reduction during idle mode; (v) deep subthreshold operation measured at 230 mV, 8.8 MHz, 87 mu W; and (vi) compensation for up to 3 x performance variation in ultra-low voltage mode.