End-to-end network slicing is a new concept for 5G+ networks, dividing the network into slices dedicated to different types of services and customized for their tasks. A key task, in this context, is satisfying service level agreements (SLA) by forecasting how many resources to allocate to each slice. The increasing complexity of the problem setup, due to service, traffic, SLA, and network algorithm diversity, makes resource allocation a daunting task for traditional (model-based) methods. Hence, data-driven methods have recently been explored. Although such methods excel at the application level (e.g., for image classification), applying them to wireless resource allocation is challenging. Not only are the required latencies significantly lower (e.g., for resource block allocation per OFDM frame), but also the cost of transferring raw data across the network to centrally process it with a heavy-duty Deep Neural Network (DNN) can be prohibitive. For this reason, Distributed DNN (DDNN) architectures have been considered, where a subset of DNN layers is executed at the edge (in the 5G network), to improve speed and communication overhead. If it is deemed that a "good enough" allocation has produced locally, the additional latency and communication are avoided; if not, intermediate features produced at the edge are sent through additional DNN layers (in a central cloud). In this paper, we propose a distributed DNN architecture for this task based on LSTM, which excels at forecasting demands with long-term dependencies, aiming to avoid under-provisioning and minimize over-provisioning. We investigate (i) joint training (offline) of the local and remote layers, and (ii) optimizing the (online) decision mechanism for offloading samples either locally or remotely. Using a real dataset, we demonstrate that our architecture resolves nearly 50% of decisions at the edge with no additional SLA penalty compared to centralized models.