Treffer: Reinforcement learning control strategies : Q-learning, SARSA, and Double Q-learning performance for the cart-pole problem

Title:
Reinforcement learning control strategies : Q-learning, SARSA, and Double Q-learning performance for the cart-pole problem
Publisher Information:
Institute of Electrical and Electronics Engineers
Publication Year:
2025
Collection:
University of Malta: OAR@UM / L-Università ta' Malta
Document Type:
Konferenz conference object
Language:
English
Rights:
info:eu-repo/semantics/closedAccess ; The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder
Accession Number:
edsbas.1AD441FA
Database:
BASE

Weitere Informationen

This paper provides control strategies of three reinforcement learning algorithms, Q-learning, SARSA, and Double Q-learning, for the CartPole-v1 simulation environment using the OpenAI Gym framework. The comparison focuses on four critical performance metrics: average reward, stability, and sample efficiency. The results indicate that Q-learning achieves the highest total rewards overall but tends to be less stable, with fluctuations in performance throughout training. SARSA demonstrates greater stability at the expense of total rewards, showing more consistent behaviour across episodes due to its on-policy nature. Double Q-learning strikes a balance, reducing the overestimation bias seen in Q-learning and offering enhanced stability and improved sample efficiency compared to the other algorithms. The performance trade-offs between maximizing rewards, maintaining stability, and efficiently utilizing samples are discussed. These findings provide insights into selecting reinforcement learning algorithms for environments like the CartPole-v1 simulation, where balancing reward maximization and stability are essential. ; peer-reviewed