The popularization of mobile phones and other multimedia portable devices paved the way for the increase in video consumption worldwide. However, it is impossible to transmit a non-compressed video due to the high bandwidth required. To achieve significant compression rates, video codecs usually employ methods that damage the visual quality perceived by the end user in non-negligible levels. Different architectures based on deep learning have been recently proposed for Video Quality Enhancement (VQE). Still, most of them are trained and validated using videos generated by a single codec under fixed configurations. With the increase of video coding formats and standards on the market, VQE methods that apply to different contexts are desired. This paper proposes a new VQE model based on the Spatio-Temporal Deformable Fusion (STDF) architecture, providing quality gains for videos compressed according to different formats and standards, such as HEVC, VVC, VP9, and AV1. The results demonstrate that by considering different video coding standards and formats to build the STDF model, a significant increase in VQE is achieved, with an average PSNR increment of up to 0.382 dB.