University of Twente Student Theses
Universal robot policy : Using a surrogate model in combination with TRPO
Erens, J. (2024) Universal robot policy : Using a surrogate model in combination with TRPO.
PDF
6MB |
Abstract: | In this thesis, the foundation is laid for the design of a universal policy for a locomotion task. That is done by considering one robot, consisting of a number of bodies with unknown density and length. The robot is part of the RoboGrammar design space. The policy is trained with reinforcement learning, using the Trust Region Policy Optimisation (TRPO) algorithm. The dynamics of the robot are modelled using generalised Polynomial Chaos Expansion (PCE) and a model ensemble to investigate whether there are advantages to using a surrogate model instead of the real environment, when training the controller. Results show that the dynamics cannot be accurately modelled yet with PCE, but that the method is promising. A more practical problem with the PCE algorithm is the computational time required. It was not used in combination with TRPO for both reasons. The model ensemble surrogate was implemented in combination with TRPO, but failed to train a successful policy. However, using the original environment from the RoboGrammar library, instead of a surrogate model, showed promising results for a universal policy. In future research, this work can be extended to a larger variety of robots, that not only have unknown density and length, but also an unknown shape and number of bodies. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 52 mechanical engineering |
Programme: | Systems and Control MSc (60359) |
Link to this item: | https://purl.utwente.nl/essays/99780 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page