Emerging High-Performance Computing (HPC) workloads, such as graph analytics, machine learning, big data science, are data-intensive. The data-intensive workloads usually present irregular memory footprints with limited data locality, and thus incur frequent cache misses and a growing desire for memory bandwidth. Driven by this need, 3D-stacked memory devices such as Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) are introduced to yield significantly higher throughput. However, the traditional interfaces and optimization methods for JEDEC DDR devices cannot fully exploit the potential performance of 3D-stacked memory to handle massive irregular memory accesses accompanied with data-intensive applications. 3D-stacked memory devices (as shown in Figure 1), such as the High Bandwidth Memory (HBM) [1] and Hybrid Memory Cube (HMC) [2], provide significantly higher bandwidth with respect to conventional Double Data Rate synchronous Dynamic Random Access Memory (DDR DRAM), and offer an opportunity to better address requirements of data-intensive applications. In these devices, the DRAM dies are stacked on top of a logic die via 3D packaging. The logic layer implements the memory controller that manages the stacked DRAMs. Well known commercial devices using this technology are the latest generations of NVIDIA's Graphic Processing Units (GPUs), Intel's Xeon Phi processors and Fujitsu PrimeHPC FX100. One issue for data-intensive applications are the frequent generation of memory hotspots, due to the fine-grained nature of their data accesses. Memory hotspots are frequently accessed memory locations that may significantly hinder the performance of DRAM devices, due to their banked design. In fact, frequent accesses to the same memory banks lead to increased bank conflicts of the memory operations, thus lengthening their latency [3]. Given nondeterministic memory footprints presented in the irregular applications, the bank-interleaving may not be able to avoid the hotspot formations as expected.