University of Twente Student Theses


Reinforcement learning for robot navigation in constrained environments

Barbero, Marta (2018) Reinforcement learning for robot navigation in constrained environments.

[img] PDF
Abstract:Making a robot arm able to reach a target position with its end-effector in a constrained envi- ronment implies finding a trajectory from the initial configuration of the robot joints to the goal configuration, avoiding collisions with existing obstacles. A practical example of this situation is the environment in which a PIRATE robot (i.e. Pipe Inspection Robot for AuTonomous Ex- ploration) operates. Although the manipulator is able to detect the environment and obstacles using its laser sensors (or camera), this knowledge however is only approximate. One method for a robust motion path planner in these conditions is to use a learned movement policy by applying reinforcement learning algorithms. Reinforcement leaning is an automatic learning technique which tries to determine how an agent has to select the actions to be performed, given the current state of the environment in which it is located, with the aim of maximizing a total predefined reward. Thus, this project focuses on verifying whether an agent, i.e. a planar manipulator, is able to independently learn how to navigate in a constrained environment with obstacles applying reinforcement learning techniques. The studied algorithms are SARSA and Q-learning. To achieve that objective, a MATLAB-based simulation environment and a physical setup have been implemented, and tests were performed with different configurations. After a deep analysis of the obtained results, it has been proven that both algorithms allow the agent to autonomously learn the required motion actions to be able to navigate inside constrained pipe-like environments. Even though, SARSA has been demonstrated to be a more "conser- vative" approach with respect to Q-learning: if there is a risk along the shortest path towards the goal (e.g. an obstacle), Q-learning will probably collide with it and then learn a policy ex- actly along that risky trajectory to minimize the needed actions to reach the target. On the other hand, SARSA will try to avoid this path completely, preferring a longer but safer trajectory. Once a full path has been learned, this acquired knowledge can be easily applied to a similar but not equal configuration of the pipe in a transfer learning perspective. In this way, the algorithms have been demonstrated to be able to quickly adapt to different pipes layouts and to different goal locations.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:50 technical science in general, 52 mechanical engineering, 54 computer science
Programme:Systems and Control MSc (60359)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page