University of Twente Student Theses
As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.
Analyzing the convergence of Q-learning through Markov decision theory
Overmars, M.G. (2021) Analyzing the convergence of Q-learning through Markov decision theory.
PDF
1MB |
Abstract: | In this thesis the convergence of Q-learning is analyzed. Q-learning is a reinforcement learning method that has been studied extensively in the past with convergence being shown in multiple cases. Existing convergence proofs rely on multiple assumptions on the algorithm. This is most evident when function approximation is introduced, leading to instability and divergence in the method. The assumptions needed to still obtain convergence are very strict, which makes them undesirable when applying the algorithm. We provide an alternative method of analyzing Q-learning. Our method analyzes Q-learning by taking into account the properties of the relevant Markov decision process (MDP). This method is applied on MDPs which are akin to a birth-death process. Multiple variations of these processes are considered with convergence results being shown for each of them. Additionally, we show how to extend these results to a larger class of MDPs. Finally, the convergence results are examined numerically. While our method doesn't consider the case of a general MDP, it is a first step in a different approach of tackling the problem of convergence without having to rely on the aforementioned assumptions. As such, this method may be able to circumvent the issues that other methods encounter. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 31 mathematics |
Programme: | Applied Mathematics MSc (60348) |
Link to this item: | https://purl.utwente.nl/essays/88829 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page