University of Twente Student Theses
Average-reward reinforcement learning in two-player two-action games
Duijl, N. van (2024) Average-reward reinforcement learning in two-player two-action games.
PDF
2MB |
Abstract: | Reinforcement learning (RL) algorithms are typically designed for stationary environments. However, as these algorithms are increasingly applied in real-world settings, more situations occur in which they interact with each other, such as algorithmic pricing. In this scenario, multiple agents try to learn optimal prices to maximise profit in a market. It is detrimental to consumers if these agents learn to cooperate by all setting high prices. Using average reward RL makes more sense in these settings than using the discounted counterpart. We compare the use of average-reward methods to the discounted counterpart used in [1] in the prisoner’s dilemma, stag hunt and snowdrift games with one-period memory. We analyse the differences in the individual best-response graphs, the basins of attraction of the Nash equilibria, and the effect of exploration. From our results, we conclude that average-reward RL gives very similar results to discounted-reward RL in these two-player two-action games. Therefore, it could be possible to apply averagereward RL in multiagent settings without much change in the Nash equilibria. [1] Janusz M. Meylahn and Lars Janssen. Limiting Dynamics for Q-Learning with Mem- ory One in Symmetric Two-Player, Two-Action Games. Complexity, 2022:e4830491, November 2022. Publisher: Hindawi. |
Item Type: | Essay (Bachelor) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 31 mathematics, 54 computer science |
Programme: | Applied Mathematics BSc (56965) |
Link to this item: | https://purl.utwente.nl/essays/100753 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page