University of Twente Student Theses

Login

Situational reinforcement learning : learning and combining local policies by using heuristic state preference values

Vrielink, S.B. (2006) Situational reinforcement learning : learning and combining local policies by using heuristic state preference values.

[img]
Preview
PDF
1MB
Abstract:This document describes an approach to reinforcement learning, called situational reinforcement learning (SRL). The main goal of the approach is to reduce the computational cost of learning behaviour in comparison to conventional reinforcement learning. One of the main goals of the research described in this document is to evaluate the implication of situational reinforcement learning on the computational cost of learning behaviour and on the optimality of the learned behaviour. The reduction in computational cost is mainly facilitated by decomposing the environment into smaller environments ¿ called situations ¿ and only learn behaviour ¿ called a policy ¿ for each situation. A global policy is then created by combining all learned situational policies. Each situation is based upon states that have an equal heuristic preference value. The learned behaviour of a situation will most likely direct the agent to a reachable, more favourable situation. The global policy that is created from combining the situational policies will therefore focus on continually reaching more favourable situations. The research not only evaluates the use of situational reinforcement learning as a stand-alone approach to artificial intelligence (AI) learning, but also applies the approach as an addition to conventional reinforcement learning. The method that uses SRL as a stand-alone approach will be referenced to as the Combined method and the method that uses it as an addition to conventional methods will be referenced to as the Enhanced method. Evaluation of the Combined method shows that the method achieves significant computational cost reductions. Unfortunately, this reduction does not come without a price and the evaluation shows that careful consideration of the heuristic function is required in order to reduce the optimality loss. The evaluation of the Enhanced method shows that on everage, when using the modified policy iteration algorithm to learn policies, the computational cost of learning a global policy is greater than when the conventional method is solely used. I believe that the significant reduction in computational cost resulting from the use of SRL is a good incentive to perform further research on this approach.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:54 computer science
Programme:Human Media Interaction MSc (60030)
Link to this item:http://purl.utwente.nl/essays/57365
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page