This work proposes the short-term load forecasting (STLF) using a combination of wavelet transform (WT) and bidirectional gated recurrent unit (BGRU). Selection of the best wavelet basis using the Shannon entropy cost function is introduced in this paper. Since entropy is a measure of the average amount of information, Shannon's entropy has been used to select nodes from the wavelet tree that have more information. The best high- and low-frequency features selected by the Shannon entropy are applied to the BGRU for STLF. In addition, a new time coding approach called the cyclical encoding is designed that appropriately models the periods and time patterns in the electrical load time series. The proposed best-tree wavelet packet transform bidirectional gated recurrent unit (BT-WPT-BGRU) method shows superior performance compared to the wavelet transform and neuro-evolutionary algorithm (WT-NEA), wavelet and collaborative representation transforms (WACRT), convolutional and recurrent neural network (CARNN), WT-BGRU, full wavelet packet transform BGRU (FWPT-BGRU), BT-WPT bidirectional LSTM (BT-WPT-BLSTM) and BT-WPT-BGRU (with one-hot encoding). The BT-WPT-BGRU model performs 71.7%, 58.8%, 58.2%, 17.6%, 12.5%, 12.5% and 6.6% better than WT-NEA, WACRT, CARNN, WT-BGRU, FWPT-BGRU, BT-WPT-BGRU (with one-hot encoding) and BT-WPT-BLSTM in terms of the MAPE metric in ISONE dataset, respectively.