-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
33 changed files
with
425 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.vscode | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,40 @@ | ||
# rl | ||
Reinforcement Learning subject | ||
# Reinforcement Learning | ||
|
||
This repository has the Reinforcement Learning subject material. | ||
|
||
## Offerings | ||
|
||
* 2023/1 - Fabrício Barth | ||
|
||
## How to setup the environment | ||
|
||
```bash | ||
python3.7 -m virtualenv venv | ||
source venv/bin/activate | ||
python -m pip install --upgrade pip | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## How to compile slides | ||
|
||
```bash | ||
pandoc -t beamer slides.md -o slides.pdf | ||
``` | ||
|
||
## How to deploy the web page | ||
|
||
```bash | ||
mkdocs gh-deploy | ||
``` | ||
|
||
## How to run the web server locally | ||
|
||
```bash | ||
mkdocs serve | ||
``` | ||
|
||
## How to publish the lessons plan | ||
|
||
```bash | ||
python publish_lessons_plan.py | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
| Date | Content | | ||
|:--------------------|:------------------------------------------------------------------------| | ||
| 2023-02-07 00:00:00 | Introduction to Reinforcement Learning | | ||
| 2023-02-09 00:00:00 | Problem-solving searching review | | ||
| 2023-02-14 00:00:00 | Adversarial search and games review | | ||
| 2023-02-16 00:00:00 | Reinforcement Learning Tooling and Environments | | ||
| 2023-02-23 00:00:00 | Q-Learning Algorithm | | ||
| 2023-02-28 00:00:00 | Q-Learning Algorithm | | ||
| 2023-03-02 00:00:00 | SARSA Algorithm | | ||
| 2023-03-07 00:00:00 | How to evaluate the performance of an agent? | | ||
| 2023-03-09 00:00:00 | Using RL in non-deterministic environments | | ||
| 2023-03-14 00:00:00 | Using RL in a competitive environment | | ||
| 2023-03-16 00:00:00 | Using RL in a competitive environment with random behavior | | ||
| 2023-03-21 00:00:00 | Implementing an agent to deal with an environment a little more complex | | ||
| 2023-03-23 00:00:00 | Deep Neural Networks review | | ||
| 2023-03-28 00:00:00 | Deep Neural Networks review | | ||
| 2023-03-30 00:00:00 | Midterm assessment - we do not have class | | ||
| 2023-04-04 00:00:00 | Midterm assessment - we do not have class | | ||
| 2023-04-06 00:00:00 | We do not have class | | ||
| 2023-04-11 00:00:00 | Neural Network Policies | | ||
| 2023-04-13 00:00:00 | Deep Q-Learning | | ||
| 2023-04-18 00:00:00 | Deep Q-Learning | | ||
| 2023-04-20 00:00:00 | Double Deep Q-Learning | | ||
| 2023-04-25 00:00:00 | Double Deep Q-Learning | | ||
| 2023-04-27 00:00:00 | Policy Optimization Algorithms (PPO) | | ||
| 2023-05-02 00:00:00 | Policy Optimization Algorithms (PPO) | | ||
| 2023-05-04 00:00:00 | Implementation of RL using TF-Agents | | ||
| 2023-05-09 00:00:00 | Implementation of RL using TF-Agents | | ||
| 2023-05-11 00:00:00 | Final Project | | ||
| 2023-05-16 00:00:00 | Final Project | | ||
| 2023-05-18 00:00:00 | Final Project | | ||
| 2023-05-23 00:00:00 | Final Project | | ||
| 2023-05-25 00:00:00 | Final Project | | ||
| 2023-05-30 00:00:00 | Final Project | | ||
| 2023-06-01 00:00:00 | Final Assessment - we do not have class | | ||
| 2023-06-06 00:00:00 | Final Assessment - we do not have class | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# Student Assessment |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Introduction to Reinforcement Learning | ||
|
||
1. Definition and key concepts | ||
1. Differences with other machine learning techniques | ||
1. Real-world applications | ||
|
||
1. How will this subject work? | ||
1. Requirements | ||
1. This is a hands-on subject! | ||
1. Content | ||
1. Assignments | ||
|
||
## Activities for the next class | ||
|
||
1. Read the chapter "II Problem-solving" from AIMA book or search on the internet about problem-solving searching and algorithms. | ||
|
||
## References | ||
|
||
* xxxx |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Problem-solving searching review | ||
|
||
1. Problem-solving searching review | ||
1. Exercise: the implementation of a taxi driver agent |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Adversarial search and games review | ||
|
||
1. Adversarial search and games review | ||
1. Exercise: the implementation of a tic-tac-toe player. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Reinforcement Learning Tooling and Environments | ||
|
||
1. [The Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation) | ||
1. Other tools and environments. | ||
1. How to use [Gymnasium API](https://gymnasium.farama.org/). | ||
1. Playing with Gymnasium API. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Q-Learning Algorithm | ||
|
||
1. Definition and key concepts | ||
1. Implementation | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# SARSA Algorithm | ||
|
||
1. Definition and key concepts | ||
1. The main difference between Q-Learning and SARSA | ||
1. Implementation | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# How to evaluate the performance of an agent? | ||
|
||
1. Metrics | ||
1. How to summarize results | ||
1. Exercise: compare Q-Learning and SARSA algorithms considering a deterministic environment |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Using RL in non-deterministic environments | ||
|
||
1. Exercise: implement two agents to the Frozen Lake problem using Q-Learning and Sarsa algorithms and compare the results |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Using RL in a competitive environment | ||
|
||
1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Using RL in a competitive environment with random behavior | ||
|
||
1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Implementing an agent to deal with an environment a little more complex | ||
|
||
1. Exercise: implement an agent to run a mountain car. | ||
1. Discussion: how we can implement agents using RL for environments like LunarLander, Atari, and others? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Deep Neural Networks review | ||
|
||
1. Neural Networks | ||
1. Gradient descent and optimization | ||
1. Exercise: implement a neural network. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Neural Network Policies | ||
|
||
1. Policy Gradients | ||
1. Exercise: implement a neural network policy | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Deep Q-Learning | ||
|
||
1. Definitions and key concepts | ||
1. Deep Q-Learning implementation | ||
1. Exercise: implement a Lunar Lander agent using DDQ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Double Deep Q-Learning | ||
|
||
1. Definitions and key concepts | ||
1. What are the differences between Deep Q-Learning and Double Deep Q-Learning | ||
1. Exercise: implement a Double Deep Q-Learning and compare the results with Deep Q-Learning |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Policy Optimization Algorithms (PPO) | ||
|
||
1. Definitions and key concepts | ||
1. Implementation | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Implementation of RL using TF-Agents | ||
|
||
TBD | ||
|
||
## References | ||
|
||
* [TF-Agents](https://www.tensorflow.org/agents) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Final Project | ||
|
||
TBD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#alunos ~ table td, #avaliacao ~ table td { | ||
vertical-align: middle; | ||
} | ||
|
||
|
||
img.event-picture { | ||
width: 40%; | ||
height: 200px; | ||
display: inline-block; | ||
object-fit: cover; | ||
} | ||
|
||
.skill-icon > svg { | ||
max-width: 40px !important; | ||
max-height: 40px !important; | ||
|
||
width: 40px !important; | ||
height: 40px !important; | ||
} |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Learning Goals | ||
|
||
At the end of the course, the student should be able to: | ||
|
||
1. Build a Reinforcement Learning system for sequential decision-making. | ||
1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution. | ||
1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more). | ||
1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Reinforcement Learning - 2023/1 | ||
|
||
1. [Learning Goals](goals.md) | ||
2. [Plan](plan.md) | ||
3. [Student Assessment](assessment.md) | ||
|
||
## Class Schedule | ||
|
||
Tuesday and Thursday from 3:45 PM until 5:45 PM. | ||
|
||
## Extra period | ||
|
||
Thursday from 12 AM until 1:30 PM. | ||
|
||
## Contact information | ||
|
||
If you have any questions or comments, please, send an e-mail to fabriciojb at insper dot edu dot br. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Plan | ||
|
||
The following activities are planned. The program is always subject to changes and adaptations as the discipline is performed. | ||
|
||
--8<-- "plan.md" |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
site_name: Reinforcement Learning | ||
repo_url: https://github.com/Insper/rl/ | ||
repo_name: Reinforcement Learning | ||
site_url: https://insper.github.io/rl/ | ||
|
||
theme: | ||
name: 'material' | ||
|
||
extra_css: | ||
- css/custom.css | ||
|
||
nav: | ||
- 'Home': 'index.md' | ||
- 'Goals': 'goals.md' | ||
- 'Plan': 'plan.md' | ||
- 'Student Assessment': 'assessment.md' | ||
- 'Classes': | ||
- 'classes/01_introduction/index.md' | ||
- 'classes/02_problem_solving/index.md' | ||
- 'classes/03_games/index.md' | ||
- 'classes/04_toolings_envs/index.md' | ||
- 'classes/05_q_learning/index.md' | ||
- 'classes/07_sarsa/index.md' | ||
- 'classes/08_evaluation/index.md' | ||
- 'classes/09_non_determ/index.md' | ||
- 'classes/10_game_env/index.md' | ||
- 'classes/11_game_env_random/index.md' | ||
- 'classes/12_more_complex/index.md' | ||
- 'classes/13_nn_review/index.md' | ||
- 'classes/14_nn_policies/index.md' | ||
- 'classes/15_deep_q_learning/index.md' | ||
- 'classes/16_double_deep_q_learning/index.md' | ||
- 'classes/17_ppo/index.md' | ||
- 'classes/18_tf_agents/index.md' | ||
- 'classes/19_final_project/index.md' | ||
|
||
|
||
extra_javascript: | ||
- https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.0.0/js-yaml.min.js | ||
- js/markdown-enhancer.js | ||
- javascripts/mathjax.js | ||
- https://polyfill.io/v3/polyfill.min.js?features=es6 | ||
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js | ||
|
||
markdown_extensions: | ||
- pymdownx.arithmatex: | ||
generic: true | ||
- attr_list | ||
- markdown.extensions.admonition | ||
- pymdownx.tasklist: | ||
custom_checkbox: true | ||
- pymdownx.details | ||
- pymdownx.tabbed | ||
- pymdownx.superfences | ||
- pymdownx.magiclink | ||
- pymdownx.critic: | ||
mode: view | ||
- pymdownx.betterem: | ||
smart_enable: all | ||
- pymdownx.caret | ||
- pymdownx.mark | ||
- pymdownx.tilde | ||
- pymdownx.smartsymbols | ||
- pymdownx.snippets: | ||
base_path: "docs/_snippets" | ||
check_paths: true | ||
- pymdownx.emoji: | ||
emoji_index: !!python/name:materialx.emoji.twemoji | ||
emoji_generator: !!python/name:materialx.emoji.to_svg | ||
- footnotes | ||
|
||
plugins: | ||
- git-revision-date-localized | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
import tabulate | ||
import pandas as pd | ||
|
||
t1 = pd.read_excel('lessons_plan.xlsx') | ||
#t1['Data'] = t1['Data'].apply(lambda x: x.strftime('%d/%m')) | ||
|
||
with open('docs/_snippets/plan.md', 'w') as f: | ||
tabela_str = tabulate.tabulate(t1[['Date', 'Content']], headers=['Date', 'Content'], tablefmt='pipe', showindex=False) | ||
f.write(tabela_str) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
mkdocs-material | ||
mkdocs-git-revision-date-localized-plugin | ||
markdown | ||
pymdown-extensions | ||
tabulate | ||
requests | ||
pandas | ||
openpyxl | ||
pytest | ||
pylint |
Oops, something went wrong.