Treffer: Reinforcement learning control strategies : Q-learning, SARSA, and Double Q-learning performance for the cart-pole problem

Title:

Reinforcement learning control strategies : Q-learning, SARSA, and Double Q-learning performance for the cart-pole problem

Authors:

Ahmed, Hani Hazza A., Fabri, Simon G., Bugeja, Marvin K., Camilleri, Kenneth P., 9th IEEE/IFAC International Conference on Control, Automation, and Diagnosis ICCAD’25

Publisher Information:

Institute of Electrical and Electronics Engineers

Publication Year:

2025

Collection:

University of Malta: OAR@UM / L-Università ta' Malta

Subject Terms:

Reinforcement learning, Computer algorithms, Control theory -- Mathematical models, Machine learning -- Statistical methods, Intelligent agents (Computer software), Computer simulation

Document Type:

Konferenz conference object

Language:

English

Relation:

https://www.um.edu.mt/library/oar/handle/123456789/136621

Availability:

https://www.um.edu.mt/library/oar/handle/123456789/136621

Rights:

info:eu-repo/semantics/closedAccess ; The copyright of this work belongs to the author(s)/publisher. The rights of this work are as defined by the appropriate Copyright Legislation or as modified by any successive legislation. Users may access this work and can make use of the information contained in accordance with the Copyright Legislation provided that the author must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the prior permission of the copyright holder

Accession Number:

edsbas.1AD441FA

Database:

BASE

Weitere Informationen

This paper provides control strategies of three reinforcement learning algorithms, Q-learning, SARSA, and Double Q-learning, for the CartPole-v1 simulation environment using the OpenAI Gym framework. The comparison focuses on four critical performance metrics: average reward, stability, and sample efficiency. The results indicate that Q-learning achieves the highest total rewards overall but tends to be less stable, with fluctuations in performance throughout training. SARSA demonstrates greater stability at the expense of total rewards, showing more consistent behaviour across episodes due to its on-policy nature. Double Q-learning strikes a balance, reducing the overestimation bias seen in Q-learning and offering enhanced stability and improved sample efficiency compared to the other algorithms. The performance trade-offs between maximizing rewards, maintaining stability, and efficiently utilizing samples are discussed. These findings provide insights into selecting reinforcement learning algorithms for environments like the CartPole-v1 simulation, where balancing reward maximization and stability are essential. ; peer-reviewed

Treffer: Reinforcement learning control strategies : Q-learning, SARSA, and Double Q-learning performance for the cart-pole problem

Weitere Informationen

Links

Zusatz-Funktionen