University of Twente Student Theses

As of Friday, 8 August 2025, the current Student Theses repository is no longer available for thesis uploads. A new Student Theses repository will be available starting Friday, 15 August 2025.

Situational reinforcement learning : learning and combining local policies by using heuristic state preference values

Vrielink, S.B. (2006) Situational reinforcement learning : learning and combining local policies by using heuristic state preference values.

PDF
1MB

Abstract:	This document describes an approach to reinforcement learning, called situational reinforcement learning (SRL). The main goal of the approach is to reduce the computational cost of learning behaviour in comparison to conventional reinforcement learning. One of the main goals of the research described in this document is to evaluate the implication of situational reinforcement learning on the computational cost of learning behaviour and on the optimality of the learned behaviour. The reduction in computational cost is mainly facilitated by decomposing the environment into smaller environments - called situations - and only learn behaviour - called a policy - for each situation. A global policy is then created by combining all learned situational policies. Each situation is based upon states that have an equal heuristic preference value. The learned behaviour of a situation will most likely direct the agent to a reachable, more favourable situation. The global policy that is created from combining the situational policies will therefore focus on continually reaching more favourable situations. The research not only evaluates the use of situational reinforcement learning as a stand-alone approach to artificial intelligence (AI) learning, but also applies the approach as an addition to conventional reinforcement learning. The method that uses SRL as a stand-alone approach will be referenced to as the Combined method and the method that uses it as an addition to conventional methods will be referenced to as the Enhanced method. Evaluation of the Combined method shows that the method achieves significant computational cost reductions. Unfortunately, this reduction does not come without a price and the evaluation shows that careful consideration of the heuristic function is required in order to reduce the optimality loss. The evaluation of the Enhanced method shows that on everage, when using the modified policy iteration algorithm to learn policies, the computational cost of learning a global policy is greater than when the conventional method is solely used. I believe that the significant reduction in computational cost resulting from the use of SRL is a good incentive to perform further research on this approach.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Interaction Technology MSc (60030)
Link to this item:	https://purl.utwente.nl/essays/57365
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page