class 09

Insper · Mar 14, 2023 · 1a4e965 · 1a4e965
1 parent b2e5bba
commit 1a4e965
Show file tree

Hide file tree

Showing 37 changed files with 1,806 additions and 41 deletions.
diff --git a/docs/classes/06_non_determ_comments/index.md b/docs/classes/06_non_determ_comments/index.md
@@ -0,0 +1,20 @@
+# Comments about the Frozen Lake implementations 
+
+In the table below there are some results from the Frozen Lake implementations done last week. 
+
+| Feature	| André	| Beatriz |	Carlos Dip | Diogo | Eduardo	| Felipe | 	Henrique | 	Letícia	| Lucas	 | Matheus |	Nívea |
+|:----------|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|
+|**algorithm**|	Sarsa	|Sarsa	|Sarsa	|Q-Learning	|Q-Learning	|Q-Learning	|Q Learning	|Sarsa	|Sarsa	|Sarsa	|Sarsa|
+|**alpha**|	0.1	| 0.098 |	0.09	| 0.1 |	0.2 |	0.1 |	0.1 |	0.03 |	0.2 |	0.05 |	0.12|
+|**gamma**|	0.99	|0.98	|0.9	|0.99	|0.999	|0.99	|0.9	|0.98	|1	|0.95|	0.99|
+|**epsilon**|	1	|1	| 0.75 |	0.999 |	0.9	| 0.9999 |	1	| 0.98 |	1 |	0.95 |	0.9|
+|**epsilon_dec**|	0.9999	|0.9999	|0.05	|0.9999	|0.9999	|0.99996	|0.9999	|0.9999	|0.99|	0.9999 |	0.9999|
+|**epsilon_min**|	0.05	|0.05|	0.99|	0.0001|	0.001|	0.0001|	0.1	|0.0001|	0.5|	0.0001|	0.0001|
+|**# episodes**|50000	| 35000 |	10000 |	30000 |	30000 |	100000 |	10000 |	18000 |	500000 |	20000 |	30000 |
+|**% of sucess**|	91%|	89%|	99%|	96%|	100%|	99%|	91%|	86%|	97%|	81%|	94%|
+
+*Important notes*: 
+
+* Carlos Dip changed the reward values to emphasize negative rewards and speed up the training using Sarsa.
+* Felipe did a very good evaluation of the epsilon decay impact. This evaluation is available [here](https://github.com/insper-classroom/frozen-lake-felipeschiavinato/blob/main/README.md).
+* How did Eduardo's implementations get a 100% rate of success?   
diff --git a/docs/classes/07_game_env/index.md b/docs/classes/07_game_env/index.md
@@ -1,4 +1,61 @@
 # Using RL in a competitive environment
 
-1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.
+The goal of this activity is to implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.
+
+## Tic-tac-toe player
+
+Do you remember this code? 
+
+```python
+#
+# Reference: https://pettingzoo.farama.org/environments/classic/tictactoe/
+#
+
+import gymnasium as gym
+from pettingzoo.classic import tictactoe_v3
+
+def play_random_agent(agent, obs):
+    x = env.action_space(agent).sample()
+    while obs['action_mask'][x] != 1:
+        x = env.action_space(agent).sample()
+    return x
+
+def play_my_agent(agent, obs):
+    # TODO you must implement your code here
+    pass
+
+env = tictactoe_v3.env(render_mode='human')
+env.reset()
+
+not_finish = True
+while not_finish:
+    for agent in ['player_1','player_2']:
+        observation, reward, termination, truncation, info = env.last() 
+        if termination or truncation:
+            not_finish = False
+        else:
+            if agent == 'player_1':
+                action = play_random_agent(agent,observation)
+            else:
+                action = play_my_agent(agent,observation)
+            print(f'play: ',action)
+            env.step(action)
+
+print(env.rewards)
+```
+
+The exercise today is to **implement an agent using reinforcement learning able to play tic-tac-toe and win or draw and never lose**. 
+
+In this case, the states (*obs*) are represented by a matrix. How to transform each possible matrix configuration into a *state id*? How many states are possible? Is it possible to define a function that has a matrix as an input and generate an *id* for each state?
+
+## Deliver
+
+* This exercise must be done by a group of 3 students. 
+
+* The **deadline is 03/15/2023 23:30 -0300.**
+
+* The implementation must be delivered through *Github classroom*. This is the link [https://classroom.github.com/a/5MNmW_QO](https://classroom.github.com/a/5MNmW_QO).
+
+* You must add everything necessary to run this project in the repository, like the README file, requirements file and code.
+
 
diff --git a/docs/classes/08_game_env_random/index.md b/docs/classes/08_game_env_random/index.md
@@ -1,4 +1,20 @@
 # Using RL in a competitive environment with random behavior
 
-1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.
+The goal of this activity is to implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.
 
+## Jogador de BlackJack
+
+A biblioteca Gym possui um ambiente que simula um jogo de [BlackJack (*Blackjack-v1*)](https://www.gymlibrary.dev/environments/toy_text/blackjack/). A documentação deste ambiente está disponível neste link [https://www.gymlibrary.dev/environments/toy_text/blackjack/](https://www.gymlibrary.dev/environments/toy_text/blackjack/). 
+
+Além da documentação, você tem acesso a duas implementações:
+
+* [BlackJack_Manual.py](./BlackJack_Manual.py): onde você pode jogar várias partidas de BlackJack e entender a representação de estado adotada pelo ambiente, e;
+* [BlackJack_Agent.py](./BlackJack_Agent.py): que tem uma implementação de agente que aprende a jogar BlackJack usando aprendizagem por reforço. 
+
+Atividades propostas: 
+
+* Execute diversas vezes o arquivo [BlackJack_Manual.py](./BlackJack_Manual.py) para entender como o ambiente funciona. Principalmente como a representação do espaço de estados funciona. 
+
+* Execute o arquivo [BlackJack_Agent.py](./BlackJack_Agent.py) com o objetivo de criar uma nova q-table. Qual o desempenho do agente? 
+
+* Como podemos obter um agente com o melhor desempenho possível? É possível criar um agente que ganha ou empata em no mínimo 85% dos jogos? Se sim, quais são os hiperparâmetros para este agente? Se não, qual é o melhor resultado encontrado? 
diff --git a/frozenlake.xlsx b/frozenlake.xlsx
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -25,7 +25,8 @@ nav:
     - 'classes/05_x_sarsa/index.md'
     - 'classes/05_xx_comments/index.md'
     - 'classes/06_non_determ/index.md'
-#    - 'classes/07_game_env/index.md'
+    - 'classes/06_non_determ_comments/index.md'
+    - 'classes/07_game_env/index.md'
 #    - 'classes/08_game_env_random/index.md'
 #    - 'classes/09_more_complex/index.md'
 #    - 'classes/11_evaluation/index.md'

diff --git a/site/404.html b/site/404.html
@@ -391,6 +391,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="/rl/classes/06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="/rl/classes/07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>

diff --git a/site/_snippets/plan/index.html b/site/_snippets/plan/index.html
@@ -393,6 +393,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../../classes/06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../../classes/07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>

diff --git a/site/assessment/index.html b/site/assessment/index.html
@@ -450,6 +450,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../classes/06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../classes/07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>
@@ -593,7 +621,7 @@ <h2 id="conversao-de-conceito-para-valor-numerico">Conversão de conceito para v
   <small>
 
       Last update:
-      <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">March 7, 2023</span>
+      <span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">March 9, 2023</span>
 
 
   </small>

diff --git a/site/classes/01_introduction/index.html b/site/classes/01_introduction/index.html
@@ -466,6 +466,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>

diff --git a/site/classes/01_introduction/introduction_rl/index.html b/site/classes/01_introduction/introduction_rl/index.html
@@ -400,6 +400,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../../06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../../07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>

diff --git a/site/classes/01_introduction/subject_rules/index.html b/site/classes/01_introduction/subject_rules/index.html
@@ -400,6 +400,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../../06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../../07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>

diff --git a/site/classes/02_problem_solving/index.html b/site/classes/02_problem_solving/index.html
@@ -487,6 +487,34 @@
 
 
 
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../06_non_determ_comments/" class="md-nav__link">
+        Comments about the Frozen Lake implementations
+      </a>
+    </li>
+
+
+
+
+
+
+
+
+
+    <li class="md-nav__item">
+      <a href="../07_game_env/" class="md-nav__link">
+        Using RL in a competitive environment
+      </a>
+    </li>
+
+
+
+
         </ul>
       </nav>
     </li>