Reinforcement Learning Algorithms for Walker2d

This repository contains implementations of three reinforcement learning algorithms: Double Deep Q-Network (DDQN), Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC). These algorithms are applied to the Walker2d-v2 environment from OpenAI Gym, demonstrating various strategies to address a complex control task with a bipedal robot.

Installation

Follow these steps to set up the project environment and run the code:

Clone this repository to your local machine:

git clone https://github.com/MMahdiSetak/WalkerRL.git

Navigate to the project directory and install the required libraries using the provided requirements.txt file:
```
pip install -r requirements.txt
```

Usage

Run the following command to train the agents with the default settings:

python main.py

Set common hyperparameters in the BaseAgent class and algorithm-specific hyperparameters in their respective agent files.

Algorithms Implemented

DDQN (Double Deep Q-Network): This algorithm mitigates overestimation by decomposing the max operation in the target into action selection and action evaluation. Though DDQN is better suited for discrete action spaces, our experiments focus on the following two algorithms.
TD3 (Twin Delayed DDPG): TD3 rectifies function approximation errors in actor-critic methods by introducing twin Q-networks and delayed policy updates.
SAC (Soft Actor-Critic): SAC is an off-policy algorithm that optimizes a stochastic policy within an entropy-regularized reinforcement learning framework. This optimization encourages a balance between exploration and exploitation.

Environment

The Walker2d-v2 environment presents a challenge for agents to learn to walk forward without falling. It features a continuous state and action space, requiring nuanced exploration strategies.

Logging and Visualization

The logging system tracks various training metrics, including rewards, episode lengths, and neural network losses. These metrics can be visualized using TensorBoard:

tensorboard --logdir=walker2d_tensorboard

Results

This section showcases the performance of the reinforcement learning agent in the Walker2d-v2 environment at various stages of training, from 200,000 to 1,000,000 iterations. The video below demonstrates the incremental learning process and the agent's improved ability to balance and walk as training progresses.

training.mp4

Acknowledgements

Parts of this codebase draw inspiration and adaptation from the Stable Baselines3 repository. I am thankful for their open-source contributions, which were instrumental in the development of this project.

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agents		agents
assets		assets
environments		environments
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Algorithms for Walker2d

Installation

Usage

Algorithms Implemented

Environment

Logging and Visualization

Results

Acknowledgements

License

About

Releases

Packages

Languages

License

MMahdiSetak/WalkerRL

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Algorithms for Walker2d

Installation

Usage

Algorithms Implemented

Environment

Logging and Visualization

Results

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages