In road transportation, long-distance routes require scheduled driving times, breaks, and restperiods, in compliance with the regulations on working conditions for truck drivers, whileensuring goods are delivered within the time windows of each customer. However, routes aresubject to uncertain travel and service times, and incidents may cause additional delays, makingpredefined schedules ineffective in many real-life situations. This paper presents a reinforcementlearning (RL) algorithm capable of making en-route decisions regarding driving times, breaks,and rest periods, under uncertain conditions. Our proposal aims at maximizing the likelihood ofon-time delivery while complying with drivers' work regulations. We use an online model-basedRL strategy that needs no prior training and is more flexible than model-free RL approaches,where the agent must be trained offline before making online decisions. Our proposal combinesmodel predictive control with a rollout strategy and Monte Carlo tree search. At each decisionstage, our algorithm anticipates the consequences of all the possible decisions in a number offuture stages (the lookahead horizon), and then uses a base policy to generate a sequence ofdecisions beyond the lookahead horizon. This base policy could be, for example, a set of decisionrules based on the experience and expertise of the transportation company covering the routes.Our numerical results show that the policy obtained using our algorithm outperforms not onlythe base policy (up to 83%), but also a policy obtained offline using deep Q networks (DQN),a state-of-the-art, model-free RL algorithm.