RL-Explorations

This repository contains implementations, notes, and experiments related to Reinforcement Learning (RL). It includes both foundational concepts and various toy projects exploring RL applications across different domains.

Of course, the first implementation is just a quick naive one of the multi-armed bandit problem based on Sutton's book, which contains action-value methods and the $\epsilon$-greedy strategy.

Foundational concepts are based off Sutton's Reinforcement Learning book.

The simulation

The repository contains three python modules which together simulate the dynamics of the multi-armed bandit problem.

Mathematical estimates

This sections aims to explore the simulation data in order to present some mathematical estimates and explanations behind some numbers the simulation presents.

Action-Value methods

Average reward method

Methods called Action-Value methods are the ones which are based off some estimate in order to act (greedy, $\epsilon$-greedy) and thus maximize some kind of cumulative reward.

The simplest method relies on averaging the rewards received.

$$Q_{t}(a) \doteq \frac{\sum_{i=1}^{t-1} R_i \cdot \mathbb{1}{A_i=a}}{\sum{i=1}^{t-1} \mathbb{1}_{A_i=a}}$$

in another words, the estimated value of action a at time step $t$ is then:

$$Q_n = \frac{R_1+R_2+\cdots+R_{n-1}}{n-1}$$

where $Q_{t}(a)$ denotes the estimated value of action $a$ at time step $t$ and $R_{i}$ denotes the reward received after the ith selection of a certain action.

To continue

Gradient Bandit Algorithms

Gradient algorithms are different. They are based off the fact that we can learn a numerical preference and after that set it in a way that we are more likely or not to chose it.

To continue

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
bandit.py		bandit.py
main.py		main.py
math.ipynb		math.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL-Explorations

The simulation

Mathematical estimates

Action-Value methods

Average reward method

Gradient Bandit Algorithms

About

Releases

Packages

Languages

License

lucasfourier/RL-Explorations

Folders and files

Latest commit

History

Repository files navigation

RL-Explorations

The simulation

Mathematical estimates

Action-Value methods

Average reward method

Gradient Bandit Algorithms

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages