Skip to content

Commit

Permalink
initial structure
Browse files Browse the repository at this point in the history
  • Loading branch information
fbarth committed Feb 2, 2023
1 parent 7bd89e3 commit d9b92cc
Show file tree
Hide file tree
Showing 33 changed files with 425 additions and 2 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.vscode

42 changes: 40 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,40 @@
# rl
Reinforcement Learning subject
# Reinforcement Learning

This repository has the Reinforcement Learning subject material.

## Offerings

* 2023/1 - Fabrício Barth

## How to setup the environment

```bash
python3.7 -m virtualenv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
```

## How to compile slides

```bash
pandoc -t beamer slides.md -o slides.pdf
```

## How to deploy the web page

```bash
mkdocs gh-deploy
```

## How to run the web server locally

```bash
mkdocs serve
```

## How to publish the lessons plan

```bash
python publish_lessons_plan.py
```
36 changes: 36 additions & 0 deletions docs/_snippets/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
| Date | Content |
|:--------------------|:------------------------------------------------------------------------|
| 2023-02-07 00:00:00 | Introduction to Reinforcement Learning |
| 2023-02-09 00:00:00 | Problem-solving searching review |
| 2023-02-14 00:00:00 | Adversarial search and games review |
| 2023-02-16 00:00:00 | Reinforcement Learning Tooling and Environments |
| 2023-02-23 00:00:00 | Q-Learning Algorithm |
| 2023-02-28 00:00:00 | Q-Learning Algorithm |
| 2023-03-02 00:00:00 | SARSA Algorithm |
| 2023-03-07 00:00:00 | How to evaluate the performance of an agent? |
| 2023-03-09 00:00:00 | Using RL in non-deterministic environments |
| 2023-03-14 00:00:00 | Using RL in a competitive environment |
| 2023-03-16 00:00:00 | Using RL in a competitive environment with random behavior |
| 2023-03-21 00:00:00 | Implementing an agent to deal with an environment a little more complex |
| 2023-03-23 00:00:00 | Deep Neural Networks review |
| 2023-03-28 00:00:00 | Deep Neural Networks review |
| 2023-03-30 00:00:00 | Midterm assessment - we do not have class |
| 2023-04-04 00:00:00 | Midterm assessment - we do not have class |
| 2023-04-06 00:00:00 | We do not have class |
| 2023-04-11 00:00:00 | Neural Network Policies |
| 2023-04-13 00:00:00 | Deep Q-Learning |
| 2023-04-18 00:00:00 | Deep Q-Learning |
| 2023-04-20 00:00:00 | Double Deep Q-Learning |
| 2023-04-25 00:00:00 | Double Deep Q-Learning |
| 2023-04-27 00:00:00 | Policy Optimization Algorithms (PPO) |
| 2023-05-02 00:00:00 | Policy Optimization Algorithms (PPO) |
| 2023-05-04 00:00:00 | Implementation of RL using TF-Agents |
| 2023-05-09 00:00:00 | Implementation of RL using TF-Agents |
| 2023-05-11 00:00:00 | Final Project |
| 2023-05-16 00:00:00 | Final Project |
| 2023-05-18 00:00:00 | Final Project |
| 2023-05-23 00:00:00 | Final Project |
| 2023-05-25 00:00:00 | Final Project |
| 2023-05-30 00:00:00 | Final Project |
| 2023-06-01 00:00:00 | Final Assessment - we do not have class |
| 2023-06-06 00:00:00 | Final Assessment - we do not have class |
1 change: 1 addition & 0 deletions docs/assessment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Student Assessment
19 changes: 19 additions & 0 deletions docs/classes/01_introduction/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Introduction to Reinforcement Learning

1. Definition and key concepts
1. Differences with other machine learning techniques
1. Real-world applications

1. How will this subject work?
1. Requirements
1. This is a hands-on subject!
1. Content
1. Assignments

## Activities for the next class

1. Read the chapter "II Problem-solving" from AIMA book or search on the internet about problem-solving searching and algorithms.

## References

* xxxx
4 changes: 4 additions & 0 deletions docs/classes/02_problem_solving/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Problem-solving searching review

1. Problem-solving searching review
1. Exercise: the implementation of a taxi driver agent
4 changes: 4 additions & 0 deletions docs/classes/03_games/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Adversarial search and games review

1. Adversarial search and games review
1. Exercise: the implementation of a tic-tac-toe player.
6 changes: 6 additions & 0 deletions docs/classes/04_toolings_envs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Reinforcement Learning Tooling and Environments

1. [The Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation)
1. Other tools and environments.
1. How to use [Gymnasium API](https://gymnasium.farama.org/).
1. Playing with Gymnasium API.
5 changes: 5 additions & 0 deletions docs/classes/05_q_learning/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Q-Learning Algorithm

1. Definition and key concepts
1. Implementation

7 changes: 7 additions & 0 deletions docs/classes/07_sarsa/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# SARSA Algorithm

1. Definition and key concepts
1. The main difference between Q-Learning and SARSA
1. Implementation


5 changes: 5 additions & 0 deletions docs/classes/08_evaluation/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# How to evaluate the performance of an agent?

1. Metrics
1. How to summarize results
1. Exercise: compare Q-Learning and SARSA algorithms considering a deterministic environment
3 changes: 3 additions & 0 deletions docs/classes/09_non_determ/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Using RL in non-deterministic environments

1. Exercise: implement two agents to the Frozen Lake problem using Q-Learning and Sarsa algorithms and compare the results
4 changes: 4 additions & 0 deletions docs/classes/10_game_env/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Using RL in a competitive environment

1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.

4 changes: 4 additions & 0 deletions docs/classes/11_game_env_random/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Using RL in a competitive environment with random behavior

1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.

4 changes: 4 additions & 0 deletions docs/classes/12_more_complex/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Implementing an agent to deal with an environment a little more complex

1. Exercise: implement an agent to run a mountain car.
1. Discussion: how we can implement agents using RL for environments like LunarLander, Atari, and others?
5 changes: 5 additions & 0 deletions docs/classes/13_nn_review/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Deep Neural Networks review

1. Neural Networks
1. Gradient descent and optimization
1. Exercise: implement a neural network.
5 changes: 5 additions & 0 deletions docs/classes/14_nn_policies/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Neural Network Policies

1. Policy Gradients
1. Exercise: implement a neural network policy

5 changes: 5 additions & 0 deletions docs/classes/15_deep_q_learning/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Deep Q-Learning

1. Definitions and key concepts
1. Deep Q-Learning implementation
1. Exercise: implement a Lunar Lander agent using DDQ
5 changes: 5 additions & 0 deletions docs/classes/16_double_deep_q_learning/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Double Deep Q-Learning

1. Definitions and key concepts
1. What are the differences between Deep Q-Learning and Double Deep Q-Learning
1. Exercise: implement a Double Deep Q-Learning and compare the results with Deep Q-Learning
5 changes: 5 additions & 0 deletions docs/classes/17_ppo/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Policy Optimization Algorithms (PPO)

1. Definitions and key concepts
1. Implementation

7 changes: 7 additions & 0 deletions docs/classes/18_tf_agents/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Implementation of RL using TF-Agents

TBD

## References

* [TF-Agents](https://www.tensorflow.org/agents)
3 changes: 3 additions & 0 deletions docs/classes/19_final_project/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Final Project

TBD
19 changes: 19 additions & 0 deletions docs/css/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#alunos ~ table td, #avaliacao ~ table td {
vertical-align: middle;
}


img.event-picture {
width: 40%;
height: 200px;
display: inline-block;
object-fit: cover;
}

.skill-icon > svg {
max-width: 40px !important;
max-height: 40px !important;

width: 40px !important;
height: 40px !important;
}
Binary file added docs/css/github.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/goals.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Learning Goals

At the end of the course, the student should be able to:

1. Build a Reinforcement Learning system for sequential decision-making.
1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution.
1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more).
1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning.
17 changes: 17 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Reinforcement Learning - 2023/1

1. [Learning Goals](goals.md)
2. [Plan](plan.md)
3. [Student Assessment](assessment.md)

## Class Schedule

Tuesday and Thursday from 3:45 PM until 5:45 PM.

## Extra period

Thursday from 12 AM until 1:30 PM.

## Contact information

If you have any questions or comments, please, send an e-mail to fabriciojb at insper dot edu dot br.
5 changes: 5 additions & 0 deletions docs/plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Plan

The following activities are planned. The program is always subject to changes and adaptations as the discipline is performed.

--8<-- "plan.md"
Binary file added lessons_plan.xlsx
Binary file not shown.
76 changes: 76 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
site_name: Reinforcement Learning
repo_url: https://github.com/Insper/rl/
repo_name: Reinforcement Learning
site_url: https://insper.github.io/rl/

theme:
name: 'material'

extra_css:
- css/custom.css

nav:
- 'Home': 'index.md'
- 'Goals': 'goals.md'
- 'Plan': 'plan.md'
- 'Student Assessment': 'assessment.md'
- 'Classes':
- 'classes/01_introduction/index.md'
- 'classes/02_problem_solving/index.md'
- 'classes/03_games/index.md'
- 'classes/04_toolings_envs/index.md'
- 'classes/05_q_learning/index.md'
- 'classes/07_sarsa/index.md'
- 'classes/08_evaluation/index.md'
- 'classes/09_non_determ/index.md'
- 'classes/10_game_env/index.md'
- 'classes/11_game_env_random/index.md'
- 'classes/12_more_complex/index.md'
- 'classes/13_nn_review/index.md'
- 'classes/14_nn_policies/index.md'
- 'classes/15_deep_q_learning/index.md'
- 'classes/16_double_deep_q_learning/index.md'
- 'classes/17_ppo/index.md'
- 'classes/18_tf_agents/index.md'
- 'classes/19_final_project/index.md'


extra_javascript:
- https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.0.0/js-yaml.min.js
- js/markdown-enhancer.js
- javascripts/mathjax.js
- https://polyfill.io/v3/polyfill.min.js?features=es6
- https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js

markdown_extensions:
- pymdownx.arithmatex:
generic: true
- attr_list
- markdown.extensions.admonition
- pymdownx.tasklist:
custom_checkbox: true
- pymdownx.details
- pymdownx.tabbed
- pymdownx.superfences
- pymdownx.magiclink
- pymdownx.critic:
mode: view
- pymdownx.betterem:
smart_enable: all
- pymdownx.caret
- pymdownx.mark
- pymdownx.tilde
- pymdownx.smartsymbols
- pymdownx.snippets:
base_path: "docs/_snippets"
check_paths: true
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
- footnotes

plugins:
- git-revision-date-localized



9 changes: 9 additions & 0 deletions publish_lessons_plan.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import tabulate
import pandas as pd

t1 = pd.read_excel('lessons_plan.xlsx')
#t1['Data'] = t1['Data'].apply(lambda x: x.strftime('%d/%m'))

with open('docs/_snippets/plan.md', 'w') as f:
tabela_str = tabulate.tabulate(t1[['Date', 'Content']], headers=['Date', 'Content'], tablefmt='pipe', showindex=False)
f.write(tabela_str)
10 changes: 10 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
mkdocs-material
mkdocs-git-revision-date-localized-plugin
markdown
pymdown-extensions
tabulate
requests
pandas
openpyxl
pytest
pylint
Loading

0 comments on commit d9b92cc

Please sign in to comment.