With the prevalence of high-dynamic-range (HDR) display devices, the demand to convert existing standard-dynamic-range television (SDRTV) video content to its corresponding HDR television (HDRTV) counterpart is growing exponentially. Herein, we propose a two-stage learning paradigm with hybrid attention mechanisms to fully exploit spatial, channelwise, and regional correlations for faithfully driving such conversion. Specifically, in the first domain-mapping stage, the depthwise self-attention and global calibration layer are proposed, which adaptively leverage feature intrarelationships to construct better scene representation and achieve engaging SDRTV-to-HDRTV transformation. In the second highlight-generation stage, considering that the overexposed regions potentially lead to detail loss, which brings enormous challenges to the conversion, we propose a regional self-attention module to specifically restore missing highlights. Extensive experimental results on public databases show that our method outperforms state-of-the-art approaches in terms of different quality evaluation measures.