File Structure

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

ElegantRL is featured with lightweight, efficient and stable, for researchers and practitioners.

Lightweight: The core codes <1,000 lines, using PyTorch, OpenAI Gym, and NumPy.
Efficient: performance is comparable with Ray RLlib.
Stable: as stable as Stable Baseline 3.

Model-free deep reinforcement learning (DRL) algorithms:

DDPG, TD3, SAC, A2C, PPO(GAE) for continuous actions
DQN, DoubleDQN, D3QN for discrete actions

For algorithm details, please check out OpenAI Spinning Up.

Model-free DRL Algorithms

More policy gradient algorithms (Actor-Critic style): Policy gradient algorithms

File Structure

-----kernel file----
elegantrl/net.py    # Neural networks.
elegantrl/agent.py  # Model-free RL algorithms.
elegantrl/main.py   # run and learn the DEMO 1 ~ 3 in run__demo()
-----utils file----
elegantrl/env.py    # gym env or custom env (MultiStockEnv Finance)
Examples.ipynb      # run and learn the DEMO 1 ~ 3 in jupyter notebook (new version)
ElegantRL-examples.ipynb  # run and learn the DEMO 1 ~ 3 in jupyter notebook (old version)

Experimental results

Results using ElegantRL

LunarLanderContinuous-v2

BipedalWalkerHardcore-v2

BipedalWalkerHardcore is a difficult task in continuous action space. There are only a few RL implementations can reach the target reward.

Check out a video on bilibili: Crack the BipedalWalkerHardcore-v2 with total reward 310 using IntelAC.

Requirements

Necessary:
| Python 3.7           
| PyTorch 1.0.2       

Not necessary:
| Numpy 1.19.0    | For ReplayBuffer. Numpy will be install when installing PyTorch
| gym 0.17.2      | For RL training env. Gym provide some tutorial env for DRL training.
| box2d-py 2.3.8  | For gym. Use pip install Box2D (instead of box2d-py)
| matplotlib 3.2  | For plots. Evaluate the agent performance.

It is lightweight.

Run

python3 Main.py
# You can see run__demo(gpu_id=0, cwd='AC_BasicAC') in Main.py.

In default, it will train a stable-DDPG in LunarLanderContinuous-v2 for 2000 second.
It would choose CPU or GPU automatically. Don't worry, I never use .cuda().
It would save the log and model parameters file in Current Working Directory cwd='AC_BasicAC'.
It would print the total reward while training. Maybe I should use TensorBoardX?
There are many comment in the code. I believe these comments can answer some of your questions.

Use other DRL algorithms?

The following steps:

See run__xxx() in Main.py.
Use run__zoo() to run an off-policy algorithm. Use run__ppo() to run on-policy such as PPO.
Choose a DRL algorithm: from Agent import AgentXXX.
Choose a gym environment: args.env_name = "LunarLanderContinuous-v2"

Training pipeline

Initialize the hyper-parameters using args.
Initialize agent = AgentXXX() : create the DRL agent based on the algorithm.
Initialize buffer = ReplayBuffer() : store the transitions.
Initialize evaluator = Evaluator() : evaluate and store the trained model.
Ater the training starts, the while-loop will break when the conditions are met (conditions: achieving the target score, maximum steps, or manually breaks).
- agent.update_buffer(...) The agent explores the environment within target steps, generates transition data, and stores it in the ReplayBuffer. Run in parallel.
- agent.update_policy(...) The agent uses a batch from the ReplayBuffer to update the network parameters. Run in parallel.
- evaluator.evaluate_and_save(...) Evaluate the performance of the agent and keep the model with the highest score. Independent of the training process.

Name		Name	Last commit message	Last commit date
Latest commit History 510 Commits
elegantrl		elegantrl
elegantrl2		elegantrl2
figs		figs
ElegantRL_examples.ipynb		ElegantRL_examples.ipynb
Examples.ipynb		Examples.ipynb
Examples2.ipynb		Examples2.ipynb
FinanceMultiStock.npy		FinanceMultiStock.npy
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

Model-free DRL Algorithms

File Structure

Experimental results

Requirements

Run

Use other DRL algorithms?

Training pipeline

About

Releases

Packages

Languages

License

Xiao000L/ElegantRL

Folders and files

Latest commit

History

Repository files navigation

Lightweight, Efficient and Stable DRL Implementation Using PyTorch

Model-free DRL Algorithms

File Structure

Experimental results

Requirements

Run

Use other DRL algorithms?

Training pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages