Earth observation from satellite sensors offers the possibility to monitor natural ecosystems by deriving spatially explicit and temporally resolved biogeophysical parameters. Optical remote sensing, however, suffers from missing data mainly due to the presence of clouds, sensor malfunctioning, and atmospheric conditions. This study proposes a novel deep learning architecture to address gap filling of satellite reflectances, more precisely the visible and near-infrared bands, and illustrates its performance at high-resolution Sentinel-2 data. We introduce GANFilling, a generative adversarial network capable of sequence-to-sequence translation, which comprises convolutional long short-term memory layers to effectively exploit complete dependencies in space- time series data. We focus on Europe and evaluate the method's performance quantitatively (through distortion and perceptual metrics) and qualitatively (via visual inspection and visual quality metrics). Quantitatively, our model offers the best trade-off between denoising corrupted data and preserving noise-free information, underscoring the importance of considering multiple metrics jointly when assessing gap filling tasks. Qualitatively, it successfully deals with various noise sources, such as clouds and missing data, constituting a robust solution to multiple scenarios and settings. We also illustrate and quantify the quality of the generated product in the relevant downstream application of vegetation greenness forecasting, where using GANFilling enhances forecasting in approximately 70% of the considered regions in Europe. This research contributes to underlining the utility of deep learning for Earth observation data, which allows for improved spatially and temporally resolved monitoring of the Earth surface.