University of Twente Student Theses


Learning to explore and map with EKF SLAM and RL

Manen, B.T. van (2021) Learning to explore and map with EKF SLAM and RL.

[img] PDF
Abstract:A fundamental aspect of research in mobile robots is autonomous navigation. When Global Positioning Systems (GPS) is not available, Simultaneous Localization And Mapping (SLAM) provides a technique to infer the robot position and build amap of the unknown environment. Extented Kalman Filter SLAM is one of the most popular filter SLAM methods. This thesis implements a novel Reinforcement Learning (RL) based EKF SLAM algorithm for exploration of static unknown environments. The proposed approach uses two hierarchically structured RL policies for the generation of sensing locations and trajectories parameterized by 3rd order Bézier curves. Additionally, the thesis focuses on exploring how EKF and RL can be combined in a single framework to minimize the EKF uncertainties and improve the map accuracy. While navigating in an unknown environment, the EKF SLAMcombined with a Light Detecting And Ranging (LIDAR) sensor provides the robot with an estimate of its current pose along with the estimated position of the environment landmarks. A map is build in the formof an occupancy grid by merging the estimated robot pose with the LIDAR data. Afterwards, the high-level RL network selects informative sensing locations based on the occupancy grid and the actual LIDAR output. The high-level policy assists the robot with obstacle avoidance by only selecting sensing near the robot. To reach these sensing locations, a second RL network is trained to compute an obstacle free trajectory that minimizes the uncertainty in the EKF estimates. The low-level RL agent manages the shape of the trajectory by modifying the positions of the Bézier control points. The thesis explores two continuous reward functions to minimize the EKF uncertainty. The first reward function computes the sum of the diagonal elements of the Kalman gain. Whereas, the second reward function directly computes the sum of the diagonal elements of the EKF covariance matrix holding the variances in the EKF estimates. To cover obstacle avoidance in the learned policy, a final reward function is studied including a discrete penalty upon collision with an obstacle. At last, a path planner discretizes the chosen trajectory into a fixed amount points serving as short termgoals for a PD controller to compute appropriate motor control commands. The RL and SLAM-based approach is implemented and tested in a 2D Python environment. Results show that the low-level policy does manage to select trajectories that reduce the EKF uncertainty. But the policy has more difficulty handling obstacle avoidance. The high-level policy did not manage to perform better than random sensing locations when evaluating the map coverage. The occupancy grids as well as the reward function likely did not provide enough information for the high-level policy to converge. Longer training is also necessary. The trajectory generation can be improved by adding a reward function based on the ground-truth positions of the robot and the landmarks. Additionally, obstacle avoidance can better be incorporated into the reward function using an exponential of the distance between the robot and the closest obstacle as a penalty. The high-level policy can further be improved by providing large occupancy grids to the RL network. Large occupancy grids provide better accuracy of the location of obstacles and unexplored areas. A reward function based on ground-truth maps is also thought to be more effective compared than the simple visited landmark percentage utilized until now.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:50 technical science in general, 52 mechanical engineering, 53 electrotechnology
Programme:Electrical Engineering MSc (60353)
Link to this item:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page