Skip to content

Commit

Permalink
class 09
Browse files Browse the repository at this point in the history
  • Loading branch information
fbarth committed Mar 14, 2023
1 parent b2e5bba commit 1a4e965
Show file tree
Hide file tree
Showing 37 changed files with 1,806 additions and 41 deletions.
20 changes: 20 additions & 0 deletions docs/classes/06_non_determ_comments/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Comments about the Frozen Lake implementations

In the table below there are some results from the Frozen Lake implementations done last week.

| Feature | André | Beatriz | Carlos Dip | Diogo | Eduardo | Felipe | Henrique | Letícia | Lucas | Matheus | Nívea |
|:----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
|**algorithm**| Sarsa |Sarsa |Sarsa |Q-Learning |Q-Learning |Q-Learning |Q Learning |Sarsa |Sarsa |Sarsa |Sarsa|
|**alpha**| 0.1 | 0.098 | 0.09 | 0.1 | 0.2 | 0.1 | 0.1 | 0.03 | 0.2 | 0.05 | 0.12|
|**gamma**| 0.99 |0.98 |0.9 |0.99 |0.999 |0.99 |0.9 |0.98 |1 |0.95| 0.99|
|**epsilon**| 1 |1 | 0.75 | 0.999 | 0.9 | 0.9999 | 1 | 0.98 | 1 | 0.95 | 0.9|
|**epsilon_dec**| 0.9999 |0.9999 |0.05 |0.9999 |0.9999 |0.99996 |0.9999 |0.9999 |0.99| 0.9999 | 0.9999|
|**epsilon_min**| 0.05 |0.05| 0.99| 0.0001| 0.001| 0.0001| 0.1 |0.0001| 0.5| 0.0001| 0.0001|
|**# episodes**|50000 | 35000 | 10000 | 30000 | 30000 | 100000 | 10000 | 18000 | 500000 | 20000 | 30000 |
|**% of sucess**| 91%| 89%| 99%| 96%| 100%| 99%| 91%| 86%| 97%| 81%| 94%|

*Important notes*:

* Carlos Dip changed the reward values to emphasize negative rewards and speed up the training using Sarsa.
* Felipe did a very good evaluation of the epsilon decay impact. This evaluation is available [here](https://github.com/insper-classroom/frozen-lake-felipeschiavinato/blob/main/README.md).
* How did Eduardo's implementations get a 100% rate of success?
59 changes: 58 additions & 1 deletion docs/classes/07_game_env/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,61 @@
# Using RL in a competitive environment

1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.
The goal of this activity is to implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.

## Tic-tac-toe player

Do you remember this code?

```python
#
# Reference: https://pettingzoo.farama.org/environments/classic/tictactoe/
#

import gymnasium as gym
from pettingzoo.classic import tictactoe_v3

def play_random_agent(agent, obs):
x = env.action_space(agent).sample()
while obs['action_mask'][x] != 1:
x = env.action_space(agent).sample()
return x

def play_my_agent(agent, obs):
# TODO you must implement your code here
pass

env = tictactoe_v3.env(render_mode='human')
env.reset()

not_finish = True
while not_finish:
for agent in ['player_1','player_2']:
observation, reward, termination, truncation, info = env.last()
if termination or truncation:
not_finish = False
else:
if agent == 'player_1':
action = play_random_agent(agent,observation)
else:
action = play_my_agent(agent,observation)
print(f'play: ',action)
env.step(action)

print(env.rewards)
```

The exercise today is to **implement an agent using reinforcement learning able to play tic-tac-toe and win or draw and never lose**.

In this case, the states (*obs*) are represented by a matrix. How to transform each possible matrix configuration into a *state id*? How many states are possible? Is it possible to define a function that has a matrix as an input and generate an *id* for each state?

## Deliver

* This exercise must be done by a group of 3 students.

* The **deadline is 03/15/2023 23:30 -0300.**

* The implementation must be delivered through *Github classroom*. This is the link [https://classroom.github.com/a/5MNmW_QO](https://classroom.github.com/a/5MNmW_QO).

* You must add everything necessary to run this project in the repository, like the README file, requirements file and code.


18 changes: 17 additions & 1 deletion docs/classes/08_game_env_random/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@
# Using RL in a competitive environment with random behavior

1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.
The goal of this activity is to implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.

## Jogador de BlackJack

A biblioteca Gym possui um ambiente que simula um jogo de [BlackJack (*Blackjack-v1*)](https://www.gymlibrary.dev/environments/toy_text/blackjack/). A documentação deste ambiente está disponível neste link [https://www.gymlibrary.dev/environments/toy_text/blackjack/](https://www.gymlibrary.dev/environments/toy_text/blackjack/).

Além da documentação, você tem acesso a duas implementações:

* [BlackJack_Manual.py](./BlackJack_Manual.py): onde você pode jogar várias partidas de BlackJack e entender a representação de estado adotada pelo ambiente, e;
* [BlackJack_Agent.py](./BlackJack_Agent.py): que tem uma implementação de agente que aprende a jogar BlackJack usando aprendizagem por reforço.

Atividades propostas:

* Execute diversas vezes o arquivo [BlackJack_Manual.py](./BlackJack_Manual.py) para entender como o ambiente funciona. Principalmente como a representação do espaço de estados funciona.

* Execute o arquivo [BlackJack_Agent.py](./BlackJack_Agent.py) com o objetivo de criar uma nova q-table. Qual o desempenho do agente?

* Como podemos obter um agente com o melhor desempenho possível? É possível criar um agente que ganha ou empata em no mínimo 85% dos jogos? Se sim, quais são os hiperparâmetros para este agente? Se não, qual é o melhor resultado encontrado?
Binary file added frozenlake.xlsx
Binary file not shown.
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ nav:
- 'classes/05_x_sarsa/index.md'
- 'classes/05_xx_comments/index.md'
- 'classes/06_non_determ/index.md'
# - 'classes/07_game_env/index.md'
- 'classes/06_non_determ_comments/index.md'
- 'classes/07_game_env/index.md'
# - 'classes/08_game_env_random/index.md'
# - 'classes/09_more_complex/index.md'
# - 'classes/11_evaluation/index.md'
Expand Down
28 changes: 28 additions & 0 deletions site/404.html
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,34 @@








<li class="md-nav__item">
<a href="/rl/classes/06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="/rl/classes/07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down
28 changes: 28 additions & 0 deletions site/_snippets/plan/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -393,6 +393,34 @@








<li class="md-nav__item">
<a href="../../classes/06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="../../classes/07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down
30 changes: 29 additions & 1 deletion site/assessment/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,34 @@








<li class="md-nav__item">
<a href="../classes/06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="../classes/07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down Expand Up @@ -593,7 +621,7 @@ <h2 id="conversao-de-conceito-para-valor-numerico">Conversão de conceito para v
<small>

Last update:
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">March 7, 2023</span>
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">March 9, 2023</span>


</small>
Expand Down
28 changes: 28 additions & 0 deletions site/classes/01_introduction/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,34 @@








<li class="md-nav__item">
<a href="../06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="../07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down
28 changes: 28 additions & 0 deletions site/classes/01_introduction/introduction_rl/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,34 @@








<li class="md-nav__item">
<a href="../../06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="../../07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down
28 changes: 28 additions & 0 deletions site/classes/01_introduction/subject_rules/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,34 @@








<li class="md-nav__item">
<a href="../../06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="../../07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down
28 changes: 28 additions & 0 deletions site/classes/02_problem_solving/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -487,6 +487,34 @@








<li class="md-nav__item">
<a href="../06_non_determ_comments/" class="md-nav__link">
Comments about the Frozen Lake implementations
</a>
</li>









<li class="md-nav__item">
<a href="../07_game_env/" class="md-nav__link">
Using RL in a competitive environment
</a>
</li>




</ul>
</nav>
</li>
Expand Down
Loading

0 comments on commit 1a4e965

Please sign in to comment.