Lock synchronization overheads may be significant in a shared-memory multiprocessor system-on-a-chip (SoC) implementation. These overheads are observed in terms of lock latency, lock delay and memory bandwidth consumption in the system. There has been much previous work to speed up access of lock variables via specialized aches[1], software queues [2]-[5] and delayed loops, e.g., exponential backoff [2]. However, in the context of SoC, these previously reported techniques all have drawbacks not present in our technique. We present a novel, efficient, small and very simple hardware unit, SoC Lock Cache (SoCLC), which resolves the critical section (CS) interactions among multiple processors and improves the performance criteria in terms of lock latency, lock delay and bandwidth consumption in a shared-memory multiprocessor SoC. Our mechanism is capable of handling short CSs as well as long CSs. This combined support has been established at both the hardware architecture level and the software architecture level including the real-time operating system (RTOS) kernel level facilities (such as support for preemptive versus non-preemptive synchronization, scheduling of lock variable accesses, interrupt handling and RTOS initialization). The experimental results of a microbenchmark program, which simulates an application with high-contention critical section accesses under a four-process or platform with shared-memory, showed an overall speedup of 55%. Furthermore, a database application example with client server pairs of tasks, run on the same platform, showed that our mechanism achieved an overall speedup of 27%.