Energy demand prediction can be obtained for different customer categories or geolocations, e.g., predicting the energy demand over different cities. Traditionally, these prediction tasks are solved independently without considering the common problem-solving knowledge sharing among them. However, addressing one task may help facilitate the training process or improve the prediction performance of another one via knowledge transfer. In this article, we propose a two-stage multitasking prediction (TS-MTP) framework to address the energy demand prediction problem over multiple locations, in which each task has a deep neural network (DNN) model as the predictor. TS-MTP includes single-tasking learning (STL) and multitasking learning (MTL) stages. The STL stage focuses on addressing each prediction task independently with a gradient descent-based optimization algorithm until the training accuracy cannot be improved, so that the optimal DNN structure parameters for each task can be achieved. In the MTL stage, for a specified target task, the knowledge, i.e., DNN connection weights and biases acquired in STL, is extracted and transferred from the source tasks and reused in the target task to help further improve its prediction accuracy. To decide the amount of knowledge to be reused, a coefficient is assigned to each source task, and particle swarm optimization is applied to obtain the optimal coefficients. The performance of TS-MTP is verified on several problem sets that are created from different step-ahead predictions. The superiority of TS-MTP is demonstrated in comparison to several state-of-the-art DNNs that are popular in the time-series prediction domain. The results show that TS-MTP can lead to a more than 35% accuracy improvement compared with the STL without knowledge transfer.