This repository contains implementations, notes, and experiments related to Reinforcement Learning (RL). It includes both foundational concepts and various toy projects exploring RL applications across different domains.
Of course, the first implementation is just a quick naive one of the multi-armed bandit problem based on Sutton's book, which contains action-value methods and the
Foundational concepts are based off Sutton's Reinforcement Learning book.
The repository contains three python modules which together simulate the dynamics of the multi-armed bandit problem.
This sections aims to explore the simulation data in order to present some mathematical estimates and explanations behind some numbers the simulation presents.
Methods called Action-Value methods are the ones which are based off some estimate in order to act (greedy,
The simplest method relies on averaging the rewards received.
$$Q_{t}(a) \doteq \frac{\sum_{i=1}^{t-1} R_i \cdot \mathbb{1}{A_i=a}}{\sum{i=1}^{t-1} \mathbb{1}_{A_i=a}}$$
in another words, the estimated value of action a at time step
where
To continue
Gradient algorithms are different. They are based off the fact that we can learn a numerical preference and after that set it in a way that we are more likely or not to chose it.
To continue