Treffer: Reinforcement learning control strategies : Q-learning, SARSA, and Double Q-learning performance for the cart-pole problem
Weitere Informationen
This paper provides control strategies of three reinforcement learning algorithms, Q-learning, SARSA, and Double Q-learning, for the CartPole-v1 simulation environment using the OpenAI Gym framework. The comparison focuses on four critical performance metrics: average reward, stability, and sample efficiency. The results indicate that Q-learning achieves the highest total rewards overall but tends to be less stable, with fluctuations in performance throughout training. SARSA demonstrates greater stability at the expense of total rewards, showing more consistent behaviour across episodes due to its on-policy nature. Double Q-learning strikes a balance, reducing the overestimation bias seen in Q-learning and offering enhanced stability and improved sample efficiency compared to the other algorithms. The performance trade-offs between maximizing rewards, maintaining stability, and efficiently utilizing samples are discussed. These findings provide insights into selecting reinforcement learning algorithms for environments like the CartPole-v1 simulation, where balancing reward maximization and stability are essential. ; peer-reviewed