This repository contains implementations of three reinforcement learning algorithms: Double Deep Q-Network (DDQN), Twin
Delayed DDPG (TD3), and Soft Actor-Critic (SAC). These algorithms are applied to the Walker2d-v2
environment from
OpenAI Gym, demonstrating various strategies to address a complex control task with a bipedal robot.
Follow these steps to set up the project environment and run the code:
-
Clone this repository to your local machine:
git clone https://github.com/MMahdiSetak/WalkerRL.git
-
Navigate to the project directory and install the required libraries using the provided
requirements.txt
file:pip install -r requirements.txt
Run the following command to train the agents with the default settings:
python main.py
Set common hyperparameters in the BaseAgent
class and algorithm-specific hyperparameters in their respective agent
files.
-
DDQN (Double Deep Q-Network): This algorithm mitigates overestimation by decomposing the max operation in the target into action selection and action evaluation. Though DDQN is better suited for discrete action spaces, our experiments focus on the following two algorithms.
-
TD3 (Twin Delayed DDPG): TD3 rectifies function approximation errors in actor-critic methods by introducing twin Q-networks and delayed policy updates.
-
SAC (Soft Actor-Critic): SAC is an off-policy algorithm that optimizes a stochastic policy within an entropy-regularized reinforcement learning framework. This optimization encourages a balance between exploration and exploitation.
The Walker2d-v2
environment presents a challenge for agents to learn to walk forward without falling. It features a
continuous state and action space, requiring nuanced exploration strategies.
The logging system tracks various training metrics, including rewards, episode lengths, and neural network losses. These metrics can be visualized using TensorBoard:
tensorboard --logdir=walker2d_tensorboard
This section showcases the performance of the reinforcement learning agent in the Walker2d-v2
environment at various
stages of training, from 200,000 to 1,000,000 iterations. The video below demonstrates the incremental learning process
and the agent's improved ability to balance and walk as training progresses.
training.mp4
Parts of this codebase draw inspiration and adaptation from the Stable Baselines3 repository. I am thankful for their open-source contributions, which were instrumental in the development of this project.
This project is open source and available under the MIT License.