Approximate Guarantee for Approximate Dynamic Programming

Lardinois, W.T.P.

A Markov decision process (MDP) is a common way to model stochastic decision problems. Finding the optimal policy for an MDP is a challenging task. Therefore, approximation methods are used to obtain decent policies. Approximate dynamic programming (ADP) is an approximation method to obtain policies for an MDP. No approximate guarantees for ADP related to MDP exist yet. This thesis searches for an approximate guarantee for ADP by using an optimal stopping problem. In Chen & Goldberg, 2018 [10], an approximation method and approximate guarantees were obtained for optimal stopping problems. A Markov chain over the policy space of the MDP is created to obtain an optimal stopping problem, denoted by OS-MDP. The method described by Chen & Goldberg applied on OSMDP yields error bounds for solution methods and approximation methods of MDP. The estimation relates the policy found after N iterations to the policy obtained after M iterations, where N < M. This estimation has an error bound of 1 k+1, where k is a parameter that determines the complexity of the computations. Solution methods discussed are; policy iteration, value iteration and ADP. A card-game, called The Game [36], is used as a running example. The Game is modelled as MDP, approximated using ADP, and an error bound of ADP is obtained by using OS-MDP. One small numerical test of OS-MDP is performed, where N = 5 and M = 10 yields an error bound of 0.25, hence no more than a 25% improvement can be achieved.

Approximate Guarantee for Approximate Dynamic Programming

Lardinois, W.T.P. (2019)