University of Twente Student Theses
Reinforcement-learning-based navigation for autonomous mobile robots in unknown environments
Alhawary, Mohammed (2018) Reinforcement-learning-based navigation for autonomous mobile robots in unknown environments.
PDF
27MB |
Abstract: | Mobile robot navigation in an unknown environment is an important issue in autonomous robotics. Current approaches to solve the navigation problem, such as roadmap, cell decomposition and potential field, assume complete knowledge about the navigation environment. However, complete knowledge about the environment can be hardly obtained in practical applications where obstacles locations and surface friction properties are unknown. On the other hand, navigation in an unknown environment can be phrased as a reinforcement learning (RL) problem, because it is only possible to discover the optimal navigation plan through trial-anderror interaction with the environment. The goal of this project is to control a skid-steering mobile robot to navigate in an unknown environment with obstacles and a slippery floor using reinforcement learning techniques. The main task studied is the navigation to a goal location in the shortest time while avoiding obstacles and overcoming the skidding effects. The standard (model-free) Q-learning algorithm is widely used to discover optimal trajectories in unknown navigation environments. However, it converges to these trajectories with undesirably-slow rates. The Dyan-Q approach extends the Q-learning with online-constructed models about the environment properties (obstacles, slippage, etc.) resulting in a model-based Q-learning platform. In this thesis, we examine using a multinomial probabilistic model to describe the state transition probabilities of the system dynamics. Moreover, two ideas to improve the learning performance of the Dyna-Q approach are suggested. The first is to prioritize the simulated learning experiences to be around the shortest discovered navigation trajectories between the initial and the target states. The second is to learn a parametric kinematic model for the robot motion that can be used to simulate the motion characteristics of the robot in unvisited locations. It was shown that utilizing the models to simulate learning experiences makes the robot more robust to stochastic effects caused by skidding. The experimental results proved that the model-based RL algorithms converge faster to sub-optimal policies with higher success rates than the model-free Q-learning. Therefore, model-based algorithms have a better long-term performance in terms of less divergences from the discovered sub-optimal trajectories. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 50 technical science in general |
Programme: | Systems and Control MSc (60359) |
Link to this item: | https://purl.utwente.nl/essays/76349 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page