-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
37 changed files
with
1,806 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Comments about the Frozen Lake implementations | ||
|
||
In the table below there are some results from the Frozen Lake implementations done last week. | ||
|
||
| Feature | André | Beatriz | Carlos Dip | Diogo | Eduardo | Felipe | Henrique | Letícia | Lucas | Matheus | Nívea | | ||
|:----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:| | ||
|**algorithm**| Sarsa |Sarsa |Sarsa |Q-Learning |Q-Learning |Q-Learning |Q Learning |Sarsa |Sarsa |Sarsa |Sarsa| | ||
|**alpha**| 0.1 | 0.098 | 0.09 | 0.1 | 0.2 | 0.1 | 0.1 | 0.03 | 0.2 | 0.05 | 0.12| | ||
|**gamma**| 0.99 |0.98 |0.9 |0.99 |0.999 |0.99 |0.9 |0.98 |1 |0.95| 0.99| | ||
|**epsilon**| 1 |1 | 0.75 | 0.999 | 0.9 | 0.9999 | 1 | 0.98 | 1 | 0.95 | 0.9| | ||
|**epsilon_dec**| 0.9999 |0.9999 |0.05 |0.9999 |0.9999 |0.99996 |0.9999 |0.9999 |0.99| 0.9999 | 0.9999| | ||
|**epsilon_min**| 0.05 |0.05| 0.99| 0.0001| 0.001| 0.0001| 0.1 |0.0001| 0.5| 0.0001| 0.0001| | ||
|**# episodes**|50000 | 35000 | 10000 | 30000 | 30000 | 100000 | 10000 | 18000 | 500000 | 20000 | 30000 | | ||
|**% of sucess**| 91%| 89%| 99%| 96%| 100%| 99%| 91%| 86%| 97%| 81%| 94%| | ||
|
||
*Important notes*: | ||
|
||
* Carlos Dip changed the reward values to emphasize negative rewards and speed up the training using Sarsa. | ||
* Felipe did a very good evaluation of the epsilon decay impact. This evaluation is available [here](https://github.com/insper-classroom/frozen-lake-felipeschiavinato/blob/main/README.md). | ||
* How did Eduardo's implementations get a 100% rate of success? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,61 @@ | ||
# Using RL in a competitive environment | ||
|
||
1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results. | ||
The goal of this activity is to implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results. | ||
|
||
## Tic-tac-toe player | ||
|
||
Do you remember this code? | ||
|
||
```python | ||
# | ||
# Reference: https://pettingzoo.farama.org/environments/classic/tictactoe/ | ||
# | ||
|
||
import gymnasium as gym | ||
from pettingzoo.classic import tictactoe_v3 | ||
|
||
def play_random_agent(agent, obs): | ||
x = env.action_space(agent).sample() | ||
while obs['action_mask'][x] != 1: | ||
x = env.action_space(agent).sample() | ||
return x | ||
|
||
def play_my_agent(agent, obs): | ||
# TODO you must implement your code here | ||
pass | ||
|
||
env = tictactoe_v3.env(render_mode='human') | ||
env.reset() | ||
|
||
not_finish = True | ||
while not_finish: | ||
for agent in ['player_1','player_2']: | ||
observation, reward, termination, truncation, info = env.last() | ||
if termination or truncation: | ||
not_finish = False | ||
else: | ||
if agent == 'player_1': | ||
action = play_random_agent(agent,observation) | ||
else: | ||
action = play_my_agent(agent,observation) | ||
print(f'play: ',action) | ||
env.step(action) | ||
|
||
print(env.rewards) | ||
``` | ||
|
||
The exercise today is to **implement an agent using reinforcement learning able to play tic-tac-toe and win or draw and never lose**. | ||
|
||
In this case, the states (*obs*) are represented by a matrix. How to transform each possible matrix configuration into a *state id*? How many states are possible? Is it possible to define a function that has a matrix as an input and generate an *id* for each state? | ||
|
||
## Deliver | ||
|
||
* This exercise must be done by a group of 3 students. | ||
|
||
* The **deadline is 03/15/2023 23:30 -0300.** | ||
|
||
* The implementation must be delivered through *Github classroom*. This is the link [https://classroom.github.com/a/5MNmW_QO](https://classroom.github.com/a/5MNmW_QO). | ||
|
||
* You must add everything necessary to run this project in the repository, like the README file, requirements file and code. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,20 @@ | ||
# Using RL in a competitive environment with random behavior | ||
|
||
1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results. | ||
The goal of this activity is to implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results. | ||
|
||
## Jogador de BlackJack | ||
|
||
A biblioteca Gym possui um ambiente que simula um jogo de [BlackJack (*Blackjack-v1*)](https://www.gymlibrary.dev/environments/toy_text/blackjack/). A documentação deste ambiente está disponível neste link [https://www.gymlibrary.dev/environments/toy_text/blackjack/](https://www.gymlibrary.dev/environments/toy_text/blackjack/). | ||
|
||
Além da documentação, você tem acesso a duas implementações: | ||
|
||
* [BlackJack_Manual.py](./BlackJack_Manual.py): onde você pode jogar várias partidas de BlackJack e entender a representação de estado adotada pelo ambiente, e; | ||
* [BlackJack_Agent.py](./BlackJack_Agent.py): que tem uma implementação de agente que aprende a jogar BlackJack usando aprendizagem por reforço. | ||
|
||
Atividades propostas: | ||
|
||
* Execute diversas vezes o arquivo [BlackJack_Manual.py](./BlackJack_Manual.py) para entender como o ambiente funciona. Principalmente como a representação do espaço de estados funciona. | ||
|
||
* Execute o arquivo [BlackJack_Agent.py](./BlackJack_Agent.py) com o objetivo de criar uma nova q-table. Qual o desempenho do agente? | ||
|
||
* Como podemos obter um agente com o melhor desempenho possível? É possível criar um agente que ganha ou empata em no mínimo 85% dos jogos? Se sim, quais são os hiperparâmetros para este agente? Se não, qual é o melhor resultado encontrado? |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.