University of Twente Student Theses
Analyzing the convergence of Q-learning through Markov decision theory
Overmars, M.G. (2021) Analyzing the convergence of Q-learning through Markov decision theory.
PDF
1MB |
Abstract: | In this thesis the convergence of Q-learning is analyzed. Q-learning is a reinforcement learning method that has been studied extensively in the past with convergence being shown in multiple cases. Existing convergence proofs rely on multiple assumptions on the algorithm. This is most evident when function approximation is introduced, leading to instability and divergence in the method. The assumptions needed to still obtain convergence are very strict, which makes them undesirable when applying the algorithm. We provide an alternative method of analyzing Q-learning. Our method analyzes Q-learning by taking into account the properties of the relevant Markov decision process (MDP). This method is applied on MDPs which are akin to a birth-death process. Multiple variations of these processes are considered with convergence results being shown for each of them. Additionally, we show how to extend these results to a larger class of MDPs. Finally, the convergence results are examined numerically. While our method doesn't consider the case of a general MDP, it is a first step in a different approach of tackling the problem of convergence without having to rely on the aforementioned assumptions. As such, this method may be able to circumvent the issues that other methods encounter. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 31 mathematics |
Programme: | Applied Mathematics MSc (60348) |
Link to this item: | https://purl.utwente.nl/essays/88829 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page