Reconfigurable computing (RC) theory aims to take advantage of the flexibility of general-purpose processors (GPPs) alongside the performance of application specific integrated circuits (ASICs). Numerous RC architectures have been proposed since the 1960s, but all are struggling to become mainstream. The main factor that prevents RC to be used in general-purpose CPUs, GPUs, and mobile devices is that it requires extensive knowledge of digital circuit design which is lacked in most software programmers. In an RC development, a processor cooperates with a reconfigurable hardware accelerator (HA) which is usually implemented on a field-programmable gate arrays (FPGAs) chip and can be reconfigured dynamically. It implements crucial portions of software (kernels) in hardware to increase overall performance, and its design requires substantial knowledge of digital circuit design. In this paper, a novel RC architecture is proposed that provides the exact same instruction set that a standard general-purpose RISC microprocessor (e.g., ARM Cortex-M0) has while automating the generation of a tightly coupled RC component to improve system performance. This approach keeps the decades-old assemblers, compilers, debuggers and library components, and programming practices intact while utilizing the advantages of RC. The proposed architecture employs the LLVM compiler infrastructure to translate an algorithm written in a high-level language (e.g., C/C++) to machine code. It then finds the most frequent instruction pairs and generates an equivalent RC circuit that is called miniature accelerator (MA). Execution of the instruction pairs is performed by the MA in parallel with consecutive instructions. Several kernel algorithms alongside EEMBC CoreMark are used to assess the performance of the proposed architecture. Performance improvement from 4.09% to 14.17% is recorded when HA is turned on. There is a trade-off between core performance and combination of compilation time, die area, and program startup load time which includes the time required to partially reconfigure an FPGA chip.