University of Twente Student Theses
Approximate dynamic programming with adaptive multivariate simplex splines
Jongh, Maike de (2021) Approximate dynamic programming with adaptive multivariate simplex splines.
PDF
2MB |
Abstract: | Large-scale Markov decision processes (MDPs) typically suffer from the curse of dimensionality, which renders exact solution methods intractable. For such problems, we rely upon approximate dynamic programming (ADP) techniques. A fundamental element of approximate dynamic programming is value function approximation. This research presents a value function approximation architecture that is based on adaptive multivariate simplex splines. Their linearity in the parameters and the fact that they are easily evaluated and adapted on a local basis renders these functions suitable candidates for this purpose. The approximation power of a multivariate simplex spline depends to a great extent on the triangulation on which it is defined. Our main contribution is a procedure that adaptively refines this triangulation in regions of the state space that require a more accurate value function approximation. This procedure is integrated into an ADP framework. The result is a method that can be applied to any MDP with a continuous (or large discrete) state space and finite discrete action space without the need of any preadjustment of the splines based on problem-specific knowledge. The method is tested on a problem of balancing an inverted pendulum. For this problem, the performance of our algorithm is not far behind that of a problem-specific method. The results indicate that multivariate simplex splines have high potential as value function approximators in an ADP context. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Programme: | Applied Mathematics MSc (60348) |
Link to this item: | https://purl.utwente.nl/essays/86074 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page