initial structure

Insper · Feb 2, 2023 · d9b92cc · d9b92cc
1 parent 7bd89e3
commit d9b92cc
Show file tree

Hide file tree

Showing 33 changed files with 425 additions and 2 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+.vscode
+
diff --git a/README.md b/README.md
@@ -1,2 +1,40 @@
-# rl
-Reinforcement Learning subject
+# Reinforcement Learning 
+
+This repository has the Reinforcement Learning subject material. 
+
+## Offerings
+
+* 2023/1 - Fabrício Barth
+
+## How to setup the environment
+
+```bash
+python3.7 -m virtualenv venv
+source venv/bin/activate
+python -m pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+## How to compile slides
+
+```bash
+pandoc -t beamer slides.md -o slides.pdf
+```
+
+## How to deploy the web page
+
+```bash
+mkdocs gh-deploy
+```
+
+## How to run the web server locally
+
+```bash
+mkdocs serve
+```
+
+## How to publish the lessons plan
+
+```bash
+python publish_lessons_plan.py
+```
diff --git a/docs/_snippets/plan.md b/docs/_snippets/plan.md
@@ -0,0 +1,36 @@
+| Date                | Content                                                                 |
+|:--------------------|:------------------------------------------------------------------------|
+| 2023-02-07 00:00:00 | Introduction to Reinforcement Learning                                  |
+| 2023-02-09 00:00:00 | Problem-solving searching review                                        |
+| 2023-02-14 00:00:00 | Adversarial search and games review                                     |
+| 2023-02-16 00:00:00 | Reinforcement Learning Tooling and Environments                         |
+| 2023-02-23 00:00:00 | Q-Learning Algorithm                                                    |
+| 2023-02-28 00:00:00 | Q-Learning Algorithm                                                    |
+| 2023-03-02 00:00:00 | SARSA Algorithm                                                         |
+| 2023-03-07 00:00:00 | How to evaluate the performance of an agent?                            |
+| 2023-03-09 00:00:00 | Using RL in non-deterministic environments                              |
+| 2023-03-14 00:00:00 | Using RL in a competitive environment                                   |
+| 2023-03-16 00:00:00 | Using RL in a competitive environment with random behavior              |
+| 2023-03-21 00:00:00 | Implementing an agent to deal with an environment a little more complex |
+| 2023-03-23 00:00:00 | Deep Neural Networks review                                             |
+| 2023-03-28 00:00:00 | Deep Neural Networks review                                             |
+| 2023-03-30 00:00:00 | Midterm assessment - we do not have class                               |
+| 2023-04-04 00:00:00 | Midterm assessment - we do not have class                               |
+| 2023-04-06 00:00:00 | We do not have class                                                    |
+| 2023-04-11 00:00:00 | Neural Network Policies                                                 |
+| 2023-04-13 00:00:00 | Deep Q-Learning                                                         |
+| 2023-04-18 00:00:00 | Deep Q-Learning                                                         |
+| 2023-04-20 00:00:00 | Double Deep Q-Learning                                                  |
+| 2023-04-25 00:00:00 | Double Deep Q-Learning                                                  |
+| 2023-04-27 00:00:00 | Policy Optimization Algorithms (PPO)                                    |
+| 2023-05-02 00:00:00 | Policy Optimization Algorithms (PPO)                                    |
+| 2023-05-04 00:00:00 | Implementation of RL using TF-Agents                                    |
+| 2023-05-09 00:00:00 | Implementation of RL using TF-Agents                                    |
+| 2023-05-11 00:00:00 | Final Project                                                           |
+| 2023-05-16 00:00:00 | Final Project                                                           |
+| 2023-05-18 00:00:00 | Final Project                                                           |
+| 2023-05-23 00:00:00 | Final Project                                                           |
+| 2023-05-25 00:00:00 | Final Project                                                           |
+| 2023-05-30 00:00:00 | Final Project                                                           |
+| 2023-06-01 00:00:00 | Final Assessment - we do not have class                                 |
+| 2023-06-06 00:00:00 | Final Assessment - we do not have class                                 |
diff --git a/docs/assessment.md b/docs/assessment.md
@@ -0,0 +1 @@
+# Student Assessment
diff --git a/docs/classes/01_introduction/index.md b/docs/classes/01_introduction/index.md
@@ -0,0 +1,19 @@
+# Introduction to Reinforcement Learning
+
+1. Definition and key concepts
+1. Differences with other machine learning techniques
+1. Real-world applications
+
+1. How will this subject work?
+	1. Requirements
+	1. This is a hands-on subject! 
+	1. Content
+	1. Assignments 
+
+## Activities for the next class
+
+1. Read the chapter "II Problem-solving" from AIMA book or search on the internet about problem-solving searching and algorithms.
+
+## References
+
+* xxxx
diff --git a/docs/classes/02_problem_solving/index.md b/docs/classes/02_problem_solving/index.md
@@ -0,0 +1,4 @@
+# Problem-solving searching review
+
+1. Problem-solving searching review
+1. Exercise: the implementation of a taxi driver agent
diff --git a/docs/classes/03_games/index.md b/docs/classes/03_games/index.md
@@ -0,0 +1,4 @@
+# Adversarial search and games review
+
+1. Adversarial search and games review
+1. Exercise: the implementation of a tic-tac-toe player. 
diff --git a/docs/classes/04_toolings_envs/index.md b/docs/classes/04_toolings_envs/index.md
@@ -0,0 +1,6 @@
+# Reinforcement Learning Tooling and Environments
+
+1. [The Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation)
+1. Other tools and environments.
+1. How to use [Gymnasium API](https://gymnasium.farama.org/).
+1. Playing with Gymnasium API. 
diff --git a/docs/classes/05_q_learning/index.md b/docs/classes/05_q_learning/index.md
@@ -0,0 +1,5 @@
+# Q-Learning Algorithm
+
+1. Definition and key concepts
+1. Implementation
+
diff --git a/docs/classes/07_sarsa/index.md b/docs/classes/07_sarsa/index.md
@@ -0,0 +1,7 @@
+# SARSA Algorithm
+
+1. Definition and key concepts
+1. The main difference between Q-Learning and SARSA
+1. Implementation
+
+
diff --git a/docs/classes/08_evaluation/index.md b/docs/classes/08_evaluation/index.md
@@ -0,0 +1,5 @@
+# How to evaluate the performance of an agent?
+
+1. Metrics
+1. How to summarize results
+1. Exercise: compare Q-Learning and SARSA algorithms considering a deterministic environment
diff --git a/docs/classes/09_non_determ/index.md b/docs/classes/09_non_determ/index.md
@@ -0,0 +1,3 @@
+# Using RL in non-deterministic environments
+
+1. Exercise: implement two agents to the Frozen Lake problem using Q-Learning and Sarsa algorithms and compare the results
diff --git a/docs/classes/10_game_env/index.md b/docs/classes/10_game_env/index.md
@@ -0,0 +1,4 @@
+# Using RL in a competitive environment
+
+1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.
+
diff --git a/docs/classes/11_game_env_random/index.md b/docs/classes/11_game_env_random/index.md
@@ -0,0 +1,4 @@
+# Using RL in a competitive environment with random behavior
+
+1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.
+
diff --git a/docs/classes/12_more_complex/index.md b/docs/classes/12_more_complex/index.md
@@ -0,0 +1,4 @@
+# Implementing an agent to deal with an environment a little more complex
+
+1. Exercise: implement an agent to run a mountain car. 
+1. Discussion: how we can implement agents using RL for environments like LunarLander, Atari, and others?
diff --git a/docs/classes/13_nn_review/index.md b/docs/classes/13_nn_review/index.md
@@ -0,0 +1,5 @@
+# Deep Neural Networks review
+
+1. Neural Networks
+1. Gradient descent and optimization
+1. Exercise: implement a neural network.
diff --git a/docs/classes/14_nn_policies/index.md b/docs/classes/14_nn_policies/index.md
@@ -0,0 +1,5 @@
+# Neural Network Policies
+
+1. Policy Gradients
+1. Exercise: implement a neural network policy
+
diff --git a/docs/classes/15_deep_q_learning/index.md b/docs/classes/15_deep_q_learning/index.md
@@ -0,0 +1,5 @@
+# Deep Q-Learning 
+
+1. Definitions and key concepts
+1. Deep Q-Learning implementation
+1. Exercise: implement a Lunar Lander agent using DDQ
diff --git a/docs/classes/16_double_deep_q_learning/index.md b/docs/classes/16_double_deep_q_learning/index.md
@@ -0,0 +1,5 @@
+# Double Deep Q-Learning
+
+1. Definitions and key concepts
+1. What are the differences between Deep Q-Learning and Double Deep Q-Learning
+1. Exercise: implement a Double Deep Q-Learning and compare the results with Deep Q-Learning
diff --git a/docs/classes/17_ppo/index.md b/docs/classes/17_ppo/index.md
@@ -0,0 +1,5 @@
+# Policy Optimization Algorithms (PPO)
+
+1. Definitions and key concepts
+1. Implementation
+
diff --git a/docs/classes/18_tf_agents/index.md b/docs/classes/18_tf_agents/index.md
@@ -0,0 +1,7 @@
+# Implementation of RL using TF-Agents 
+
+TBD
+
+## References
+
+* [TF-Agents](https://www.tensorflow.org/agents)
diff --git a/docs/classes/19_final_project/index.md b/docs/classes/19_final_project/index.md
@@ -0,0 +1,3 @@
+# Final Project
+
+TBD
diff --git a/docs/css/custom.css b/docs/css/custom.css
@@ -0,0 +1,19 @@
+#alunos ~ table td, #avaliacao ~ table td {
+    vertical-align: middle;
+}
+
+
+img.event-picture {
+    width: 40%;
+    height: 200px;
+    display: inline-block;
+    object-fit: cover;
+}
+
+.skill-icon > svg {
+    max-width: 40px !important;
+    max-height: 40px !important;
+
+    width: 40px !important;
+    height: 40px !important;
+}
diff --git a/docs/css/github.png b/docs/css/github.png
diff --git a/docs/goals.md b/docs/goals.md
@@ -0,0 +1,8 @@
+# Learning Goals
+
+At the end of the course, the student should be able to:
+
+1. Build a Reinforcement Learning system for sequential decision-making.
+1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution.
+1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more).
+1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning.
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,17 @@
+# Reinforcement Learning - 2023/1
+
+1. [Learning Goals](goals.md)
+2. [Plan](plan.md)
+3. [Student Assessment](assessment.md)
+
+## Class Schedule
+
+Tuesday and Thursday from 3:45 PM until 5:45 PM. 
+
+## Extra period
+
+Thursday from 12 AM until 1:30 PM.
+
+## Contact information
+
+If you have any questions or comments, please, send an e-mail to fabriciojb at insper dot edu dot br. 
diff --git a/docs/plan.md b/docs/plan.md
@@ -0,0 +1,5 @@
+# Plan
+
+The following activities are planned. The program is always subject to changes and adaptations as the discipline is performed.
+
+--8<-- "plan.md"
diff --git a/lessons_plan.xlsx b/lessons_plan.xlsx
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -0,0 +1,76 @@
+site_name: Reinforcement Learning
+repo_url: https://github.com/Insper/rl/
+repo_name: Reinforcement Learning
+site_url: https://insper.github.io/rl/
+
+theme:
+    name: 'material'
+
+extra_css:
+  - css/custom.css
+
+nav:
+  - 'Home': 'index.md'
+  - 'Goals': 'goals.md'
+  - 'Plan': 'plan.md'
+  - 'Student Assessment': 'assessment.md'
+  - 'Classes':
+    - 'classes/01_introduction/index.md'
+    - 'classes/02_problem_solving/index.md'
+    - 'classes/03_games/index.md'
+    - 'classes/04_toolings_envs/index.md'
+    - 'classes/05_q_learning/index.md'
+    - 'classes/07_sarsa/index.md'
+    - 'classes/08_evaluation/index.md'
+    - 'classes/09_non_determ/index.md'
+    - 'classes/10_game_env/index.md'
+    - 'classes/11_game_env_random/index.md'
+    - 'classes/12_more_complex/index.md'
+    - 'classes/13_nn_review/index.md'
+    - 'classes/14_nn_policies/index.md'
+    - 'classes/15_deep_q_learning/index.md'
+    - 'classes/16_double_deep_q_learning/index.md'
+    - 'classes/17_ppo/index.md'
+    - 'classes/18_tf_agents/index.md'
+    - 'classes/19_final_project/index.md'
+
+
+extra_javascript:
+  - https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.0.0/js-yaml.min.js
+  - js/markdown-enhancer.js
+  - javascripts/mathjax.js
+  - https://polyfill.io/v3/polyfill.min.js?features=es6
+  - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
+
+markdown_extensions:
+  - pymdownx.arithmatex:
+      generic: true
+  - attr_list
+  - markdown.extensions.admonition
+  - pymdownx.tasklist:
+      custom_checkbox: true
+  - pymdownx.details
+  - pymdownx.tabbed
+  - pymdownx.superfences
+  - pymdownx.magiclink
+  - pymdownx.critic:
+      mode: view
+  - pymdownx.betterem:
+      smart_enable: all
+  - pymdownx.caret
+  - pymdownx.mark
+  - pymdownx.tilde
+  - pymdownx.smartsymbols
+  - pymdownx.snippets:
+      base_path: "docs/_snippets"
+      check_paths: true
+  - pymdownx.emoji:
+      emoji_index: !!python/name:materialx.emoji.twemoji
+      emoji_generator: !!python/name:materialx.emoji.to_svg
+  - footnotes
+
+plugins: 
+  - git-revision-date-localized
+
+
+
diff --git a/publish_lessons_plan.py b/publish_lessons_plan.py
@@ -0,0 +1,9 @@
+import tabulate
+import pandas as pd
+
+t1 = pd.read_excel('lessons_plan.xlsx')
+#t1['Data'] = t1['Data'].apply(lambda x: x.strftime('%d/%m'))
+
+with open('docs/_snippets/plan.md', 'w') as f:
+    tabela_str = tabulate.tabulate(t1[['Date', 'Content']], headers=['Date', 'Content'], tablefmt='pipe', showindex=False)
+    f.write(tabela_str)
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,10 @@
+mkdocs-material
+mkdocs-git-revision-date-localized-plugin
+markdown
+pymdown-extensions
+tabulate
+requests
+pandas
+openpyxl
+pytest
+pylint
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Using RL in non-deterministic environments

		1. Exercise: implement two agents to the Frozen Lake problem using Q-Learning and Sarsa algorithms and compare the results
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		# Using RL in a competitive environment

		1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		# Using RL in a competitive environment with random behavior

		1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.