University of Twente Student Theses

Login

Reinforcement learning based approach for the navigation of a pipe-inspection robot at sharp pipe corners

Zeng, Xiangshuai (2019) Reinforcement learning based approach for the navigation of a pipe-inspection robot at sharp pipe corners.

[img]

PDF
18MB

Abstract:	The PIRATE is a Pipe Inspection Robot for AuTonomous Exploration currently being developed at the Robotics & Mechatronics (RaM) research group at the University of Twente. In this thesis, a reinforcement learning (RL) based approach for navigating the PIRATE robot to move through sharp pipe corners is designed and researched. The overall movement of PIRATE is broken down into two separate parts: the process for the front part of the robot and the process for the rear part of the robot, with each simulated by using a 4-DOF robotic arm that has a similar autonomy as half of the PIRATE. A laser scanner is installed on the end-effector in simulation in order to perceive its surrounding environment. Specifically, reinforcement learning is employed for developing the path planner for the front part of PIRATE and the training task is formulated as letting the 4-DOF robotic arm reach a given target inside pipe-like obstacles. Furthermore, two supplementary approaches are developed, with the first one for letting the robot locate a target point in the pipe by itself and the second one as a navigation policy for the rear part ofPIRATE to move through the corner ofpipes. The running RL algorithm in this thesis is chosen to be Proximal Policy Optimization (PPO) and deep artificial neural networks are deployed as the function approximators in the algorithm. Most of the experiments are done in simulation where the software environment is established with Robot Operating System (ROS) and Gazebo simulator. In addition, a real robotic setup is also built for evaluating the proposed approaches in the real world. During the experiments, the performance of the RL agent is exploited under torque control scheme and position control scheme respectively. It is found that the resulting agent can be generalized to navigate the robot inside multiple different environments; the laser data plays an important role on whether the agent can find an optimal policy. In addition, the RL agent performs better in general as the robot is controlled with torque commands than with position commands, but including the information from the past and an extra penalty on the change of the joint positions helps improve the performance of the agent under position control. Next, the two supplementary approaches are evaluated and are both proven to be effective with a fairly acceptable success rate. Finally, the proposed approaches are assessed onto the real robotic setup to observe the differences between the simulation and the real world.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Systems and Control MSc (60359)
Link to this item:	https://purl.utwente.nl/essays/79790
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page