Solving an order acceptance sequential decision-making problem with Q-learning

Author(s): Calvino Sobrido, Raul (2024)

Abstract:
On this thesis we explored how can a develops a sequential decision-making problem for a fast-moving consumer good delivery company be solved using Reinforcement Learning methods. We will implement an offline tabular Q-learning algorithm that learns the optimal policy based on a specific state space combination and a point in time of the day. Additionally, we present a simulation environment for the Q-learning algorithm to learn the policy and compare the performance of the Q-learning agent with a company derived policy. With this information, we present a series of recommendations to the company on what conclusions can be made from the policy derived by the Q-learning algorithm.

Document(s):

CalvinoSobrido_BA_BMS.pdf