diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..faa1838 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +.vscode + diff --git a/README.md b/README.md index c6e7be1..75b131b 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,40 @@ -# rl -Reinforcement Learning subject +# Reinforcement Learning + +This repository has the Reinforcement Learning subject material. + +## Offerings + +* 2023/1 - Fabrício Barth + +## How to setup the environment + +```bash +python3.7 -m virtualenv venv +source venv/bin/activate +python -m pip install --upgrade pip +pip install -r requirements.txt +``` + +## How to compile slides + +```bash +pandoc -t beamer slides.md -o slides.pdf +``` + +## How to deploy the web page + +```bash +mkdocs gh-deploy +``` + +## How to run the web server locally + +```bash +mkdocs serve +``` + +## How to publish the lessons plan + +```bash +python publish_lessons_plan.py +``` diff --git a/docs/_snippets/plan.md b/docs/_snippets/plan.md new file mode 100644 index 0000000..95b62f7 --- /dev/null +++ b/docs/_snippets/plan.md @@ -0,0 +1,36 @@ +| Date | Content | +|:--------------------|:------------------------------------------------------------------------| +| 2023-02-07 00:00:00 | Introduction to Reinforcement Learning | +| 2023-02-09 00:00:00 | Problem-solving searching review | +| 2023-02-14 00:00:00 | Adversarial search and games review | +| 2023-02-16 00:00:00 | Reinforcement Learning Tooling and Environments | +| 2023-02-23 00:00:00 | Q-Learning Algorithm | +| 2023-02-28 00:00:00 | Q-Learning Algorithm | +| 2023-03-02 00:00:00 | SARSA Algorithm | +| 2023-03-07 00:00:00 | How to evaluate the performance of an agent? | +| 2023-03-09 00:00:00 | Using RL in non-deterministic environments | +| 2023-03-14 00:00:00 | Using RL in a competitive environment | +| 2023-03-16 00:00:00 | Using RL in a competitive environment with random behavior | +| 2023-03-21 00:00:00 | Implementing an agent to deal with an environment a little more complex | +| 2023-03-23 00:00:00 | Deep Neural Networks review | +| 2023-03-28 00:00:00 | Deep Neural Networks review | +| 2023-03-30 00:00:00 | Midterm assessment - we do not have class | +| 2023-04-04 00:00:00 | Midterm assessment - we do not have class | +| 2023-04-06 00:00:00 | We do not have class | +| 2023-04-11 00:00:00 | Neural Network Policies | +| 2023-04-13 00:00:00 | Deep Q-Learning | +| 2023-04-18 00:00:00 | Deep Q-Learning | +| 2023-04-20 00:00:00 | Double Deep Q-Learning | +| 2023-04-25 00:00:00 | Double Deep Q-Learning | +| 2023-04-27 00:00:00 | Policy Optimization Algorithms (PPO) | +| 2023-05-02 00:00:00 | Policy Optimization Algorithms (PPO) | +| 2023-05-04 00:00:00 | Implementation of RL using TF-Agents | +| 2023-05-09 00:00:00 | Implementation of RL using TF-Agents | +| 2023-05-11 00:00:00 | Final Project | +| 2023-05-16 00:00:00 | Final Project | +| 2023-05-18 00:00:00 | Final Project | +| 2023-05-23 00:00:00 | Final Project | +| 2023-05-25 00:00:00 | Final Project | +| 2023-05-30 00:00:00 | Final Project | +| 2023-06-01 00:00:00 | Final Assessment - we do not have class | +| 2023-06-06 00:00:00 | Final Assessment - we do not have class | \ No newline at end of file diff --git a/docs/assessment.md b/docs/assessment.md new file mode 100644 index 0000000..b0eb94b --- /dev/null +++ b/docs/assessment.md @@ -0,0 +1 @@ +# Student Assessment diff --git a/docs/classes/01_introduction/index.md b/docs/classes/01_introduction/index.md new file mode 100644 index 0000000..d40aadd --- /dev/null +++ b/docs/classes/01_introduction/index.md @@ -0,0 +1,19 @@ +# Introduction to Reinforcement Learning + +1. Definition and key concepts +1. Differences with other machine learning techniques +1. Real-world applications + +1. How will this subject work? + 1. Requirements + 1. This is a hands-on subject! + 1. Content + 1. Assignments + +## Activities for the next class + +1. Read the chapter "II Problem-solving" from AIMA book or search on the internet about problem-solving searching and algorithms. + +## References + +* xxxx \ No newline at end of file diff --git a/docs/classes/02_problem_solving/index.md b/docs/classes/02_problem_solving/index.md new file mode 100644 index 0000000..9366110 --- /dev/null +++ b/docs/classes/02_problem_solving/index.md @@ -0,0 +1,4 @@ +# Problem-solving searching review + +1. Problem-solving searching review +1. Exercise: the implementation of a taxi driver agent diff --git a/docs/classes/03_games/index.md b/docs/classes/03_games/index.md new file mode 100644 index 0000000..4e3ddec --- /dev/null +++ b/docs/classes/03_games/index.md @@ -0,0 +1,4 @@ +# Adversarial search and games review + +1. Adversarial search and games review +1. Exercise: the implementation of a tic-tac-toe player. diff --git a/docs/classes/04_toolings_envs/index.md b/docs/classes/04_toolings_envs/index.md new file mode 100644 index 0000000..cd1964a --- /dev/null +++ b/docs/classes/04_toolings_envs/index.md @@ -0,0 +1,6 @@ +# Reinforcement Learning Tooling and Environments + +1. [The Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation) +1. Other tools and environments. +1. How to use [Gymnasium API](https://gymnasium.farama.org/). +1. Playing with Gymnasium API. diff --git a/docs/classes/05_q_learning/index.md b/docs/classes/05_q_learning/index.md new file mode 100644 index 0000000..61919b1 --- /dev/null +++ b/docs/classes/05_q_learning/index.md @@ -0,0 +1,5 @@ +# Q-Learning Algorithm + +1. Definition and key concepts +1. Implementation + diff --git a/docs/classes/07_sarsa/index.md b/docs/classes/07_sarsa/index.md new file mode 100644 index 0000000..3fc027f --- /dev/null +++ b/docs/classes/07_sarsa/index.md @@ -0,0 +1,7 @@ +# SARSA Algorithm + +1. Definition and key concepts +1. The main difference between Q-Learning and SARSA +1. Implementation + + diff --git a/docs/classes/08_evaluation/index.md b/docs/classes/08_evaluation/index.md new file mode 100644 index 0000000..660a3bd --- /dev/null +++ b/docs/classes/08_evaluation/index.md @@ -0,0 +1,5 @@ +# How to evaluate the performance of an agent? + +1. Metrics +1. How to summarize results +1. Exercise: compare Q-Learning and SARSA algorithms considering a deterministic environment \ No newline at end of file diff --git a/docs/classes/09_non_determ/index.md b/docs/classes/09_non_determ/index.md new file mode 100644 index 0000000..0411122 --- /dev/null +++ b/docs/classes/09_non_determ/index.md @@ -0,0 +1,3 @@ +# Using RL in non-deterministic environments + +1. Exercise: implement two agents to the Frozen Lake problem using Q-Learning and Sarsa algorithms and compare the results diff --git a/docs/classes/10_game_env/index.md b/docs/classes/10_game_env/index.md new file mode 100644 index 0000000..175c9a7 --- /dev/null +++ b/docs/classes/10_game_env/index.md @@ -0,0 +1,4 @@ +# Using RL in a competitive environment + +1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results. + diff --git a/docs/classes/11_game_env_random/index.md b/docs/classes/11_game_env_random/index.md new file mode 100644 index 0000000..84c6f1d --- /dev/null +++ b/docs/classes/11_game_env_random/index.md @@ -0,0 +1,4 @@ +# Using RL in a competitive environment with random behavior + +1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results. + diff --git a/docs/classes/12_more_complex/index.md b/docs/classes/12_more_complex/index.md new file mode 100644 index 0000000..9e95fa1 --- /dev/null +++ b/docs/classes/12_more_complex/index.md @@ -0,0 +1,4 @@ +# Implementing an agent to deal with an environment a little more complex + +1. Exercise: implement an agent to run a mountain car. +1. Discussion: how we can implement agents using RL for environments like LunarLander, Atari, and others? \ No newline at end of file diff --git a/docs/classes/13_nn_review/index.md b/docs/classes/13_nn_review/index.md new file mode 100644 index 0000000..5a5d26a --- /dev/null +++ b/docs/classes/13_nn_review/index.md @@ -0,0 +1,5 @@ +# Deep Neural Networks review + +1. Neural Networks +1. Gradient descent and optimization +1. Exercise: implement a neural network. \ No newline at end of file diff --git a/docs/classes/14_nn_policies/index.md b/docs/classes/14_nn_policies/index.md new file mode 100644 index 0000000..835df2f --- /dev/null +++ b/docs/classes/14_nn_policies/index.md @@ -0,0 +1,5 @@ +# Neural Network Policies + +1. Policy Gradients +1. Exercise: implement a neural network policy + diff --git a/docs/classes/15_deep_q_learning/index.md b/docs/classes/15_deep_q_learning/index.md new file mode 100644 index 0000000..e906f7b --- /dev/null +++ b/docs/classes/15_deep_q_learning/index.md @@ -0,0 +1,5 @@ +# Deep Q-Learning + +1. Definitions and key concepts +1. Deep Q-Learning implementation +1. Exercise: implement a Lunar Lander agent using DDQ \ No newline at end of file diff --git a/docs/classes/16_double_deep_q_learning/index.md b/docs/classes/16_double_deep_q_learning/index.md new file mode 100644 index 0000000..31f59c9 --- /dev/null +++ b/docs/classes/16_double_deep_q_learning/index.md @@ -0,0 +1,5 @@ +# Double Deep Q-Learning + +1. Definitions and key concepts +1. What are the differences between Deep Q-Learning and Double Deep Q-Learning +1. Exercise: implement a Double Deep Q-Learning and compare the results with Deep Q-Learning diff --git a/docs/classes/17_ppo/index.md b/docs/classes/17_ppo/index.md new file mode 100644 index 0000000..8ea73f5 --- /dev/null +++ b/docs/classes/17_ppo/index.md @@ -0,0 +1,5 @@ +# Policy Optimization Algorithms (PPO) + +1. Definitions and key concepts +1. Implementation + diff --git a/docs/classes/18_tf_agents/index.md b/docs/classes/18_tf_agents/index.md new file mode 100644 index 0000000..e86642e --- /dev/null +++ b/docs/classes/18_tf_agents/index.md @@ -0,0 +1,7 @@ +# Implementation of RL using TF-Agents + +TBD + +## References + +* [TF-Agents](https://www.tensorflow.org/agents) \ No newline at end of file diff --git a/docs/classes/19_final_project/index.md b/docs/classes/19_final_project/index.md new file mode 100644 index 0000000..216f641 --- /dev/null +++ b/docs/classes/19_final_project/index.md @@ -0,0 +1,3 @@ +# Final Project + +TBD \ No newline at end of file diff --git a/docs/css/custom.css b/docs/css/custom.css new file mode 100644 index 0000000..0022aa8 --- /dev/null +++ b/docs/css/custom.css @@ -0,0 +1,19 @@ +#alunos ~ table td, #avaliacao ~ table td { + vertical-align: middle; +} + + +img.event-picture { + width: 40%; + height: 200px; + display: inline-block; + object-fit: cover; +} + +.skill-icon > svg { + max-width: 40px !important; + max-height: 40px !important; + + width: 40px !important; + height: 40px !important; +} \ No newline at end of file diff --git a/docs/css/github.png b/docs/css/github.png new file mode 100644 index 0000000..8b25551 Binary files /dev/null and b/docs/css/github.png differ diff --git a/docs/goals.md b/docs/goals.md new file mode 100644 index 0000000..d05e8d0 --- /dev/null +++ b/docs/goals.md @@ -0,0 +1,8 @@ +# Learning Goals + +At the end of the course, the student should be able to: + +1. Build a Reinforcement Learning system for sequential decision-making. +1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution. +1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more). +1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning. \ No newline at end of file diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..c32ff3a --- /dev/null +++ b/docs/index.md @@ -0,0 +1,17 @@ +# Reinforcement Learning - 2023/1 + +1. [Learning Goals](goals.md) +2. [Plan](plan.md) +3. [Student Assessment](assessment.md) + +## Class Schedule + +Tuesday and Thursday from 3:45 PM until 5:45 PM. + +## Extra period + +Thursday from 12 AM until 1:30 PM. + +## Contact information + +If you have any questions or comments, please, send an e-mail to fabriciojb at insper dot edu dot br. \ No newline at end of file diff --git a/docs/plan.md b/docs/plan.md new file mode 100644 index 0000000..1f4d395 --- /dev/null +++ b/docs/plan.md @@ -0,0 +1,5 @@ +# Plan + +The following activities are planned. The program is always subject to changes and adaptations as the discipline is performed. + +--8<-- "plan.md" \ No newline at end of file diff --git a/lessons_plan.xlsx b/lessons_plan.xlsx new file mode 100644 index 0000000..ed2fce9 Binary files /dev/null and b/lessons_plan.xlsx differ diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100644 index 0000000..61e22e7 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,76 @@ +site_name: Reinforcement Learning +repo_url: https://github.com/Insper/rl/ +repo_name: Reinforcement Learning +site_url: https://insper.github.io/rl/ + +theme: + name: 'material' + +extra_css: + - css/custom.css + +nav: + - 'Home': 'index.md' + - 'Goals': 'goals.md' + - 'Plan': 'plan.md' + - 'Student Assessment': 'assessment.md' + - 'Classes': + - 'classes/01_introduction/index.md' + - 'classes/02_problem_solving/index.md' + - 'classes/03_games/index.md' + - 'classes/04_toolings_envs/index.md' + - 'classes/05_q_learning/index.md' + - 'classes/07_sarsa/index.md' + - 'classes/08_evaluation/index.md' + - 'classes/09_non_determ/index.md' + - 'classes/10_game_env/index.md' + - 'classes/11_game_env_random/index.md' + - 'classes/12_more_complex/index.md' + - 'classes/13_nn_review/index.md' + - 'classes/14_nn_policies/index.md' + - 'classes/15_deep_q_learning/index.md' + - 'classes/16_double_deep_q_learning/index.md' + - 'classes/17_ppo/index.md' + - 'classes/18_tf_agents/index.md' + - 'classes/19_final_project/index.md' + + +extra_javascript: + - https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.0.0/js-yaml.min.js + - js/markdown-enhancer.js + - javascripts/mathjax.js + - https://polyfill.io/v3/polyfill.min.js?features=es6 + - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js + +markdown_extensions: + - pymdownx.arithmatex: + generic: true + - attr_list + - markdown.extensions.admonition + - pymdownx.tasklist: + custom_checkbox: true + - pymdownx.details + - pymdownx.tabbed + - pymdownx.superfences + - pymdownx.magiclink + - pymdownx.critic: + mode: view + - pymdownx.betterem: + smart_enable: all + - pymdownx.caret + - pymdownx.mark + - pymdownx.tilde + - pymdownx.smartsymbols + - pymdownx.snippets: + base_path: "docs/_snippets" + check_paths: true + - pymdownx.emoji: + emoji_index: !!python/name:materialx.emoji.twemoji + emoji_generator: !!python/name:materialx.emoji.to_svg + - footnotes + +plugins: + - git-revision-date-localized + + + diff --git a/publish_lessons_plan.py b/publish_lessons_plan.py new file mode 100644 index 0000000..a7abb35 --- /dev/null +++ b/publish_lessons_plan.py @@ -0,0 +1,9 @@ +import tabulate +import pandas as pd + +t1 = pd.read_excel('lessons_plan.xlsx') +#t1['Data'] = t1['Data'].apply(lambda x: x.strftime('%d/%m')) + +with open('docs/_snippets/plan.md', 'w') as f: + tabela_str = tabulate.tabulate(t1[['Date', 'Content']], headers=['Date', 'Content'], tablefmt='pipe', showindex=False) + f.write(tabela_str) diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..a0027af --- /dev/null +++ b/requirements.txt @@ -0,0 +1,10 @@ +mkdocs-material +mkdocs-git-revision-date-localized-plugin +markdown +pymdown-extensions +tabulate +requests +pandas +openpyxl +pytest +pylint \ No newline at end of file diff --git a/rl_ementa_en.md b/rl_ementa_en.md new file mode 100644 index 0000000..17ac165 --- /dev/null +++ b/rl_ementa_en.md @@ -0,0 +1,50 @@ +# Reinforcement Learning + +Course load: 80 + +## Prerequisites + +1. Proficiency in Python. +1. Basic machine learning knowledge. + +## Syllabus + +Reinforcement Learning (RL). RL Algorithms. How to build a reinforcement learning solution. + +## Learning goals + +At the end of the course, the student should be able to: + +1. Build a Reinforcement Learning system for sequential decision-making. +1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution. +1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more). +1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning. + +## Detailed Syllabus + +1. Introduction to Reinforcement Learning. +1. Implementation of autonomous agents using reinforcement learning. +1. Temporal-Difference learning. +1. Q-Learning algorithm. +1. Sarsa algorithm. +1. Policy Gradients and Proximal Policy Optimization (PPO). +1. Deep Q-Learning algorithms. +1. Implementations of autonomous agents using OpenAI's Gym project and Kaggle's library for RL. +1. Reinforcement learning use cases. + +## Basic Bibliography + +1. GÉRON, A. Hands-on Machine Learning with Scikit-learn, Keras, and TensorFlow, 2ª ed., O'Reilly, 2021. +1. SUTTON, R.; BARTO, A. Reinforcement Learning: An Introduction. Second Edition. The MIT Press, 2018. +1. Van Hasselt, H., Guez, A. and Silver, D., 2016, March. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1). +1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. +1. Brockman, G. et al., 2016. Openai gym. arXiv preprint arXiv:1606.01540. + + +## Supplementary Bibliography + +1. NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013. +1. SILVER, D.; SINGH S.; PRECUP D.; SUTTON R. [Reward is enough](https://doi.org/10.1016/j.artint.2021.103535). Artificial Intelligence. Vol 299, 2021. +1. [MuZero: Mastering Go, chess, shogi and Atari without rules](https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules). Publicado em Dezembro, 2020. +1. SILVER, D.; HUBERT T.; SCHRITTWIESER, J.; ANTONOGLOU, I.; LAI, M.; GUEZ, A. [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://doi.org/10.1126/science.aar6404). Science 362, 1140-1144 (2018). +1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. diff --git a/rl_ementa_pt.md b/rl_ementa_pt.md new file mode 100644 index 0000000..14daec4 --- /dev/null +++ b/rl_ementa_pt.md @@ -0,0 +1,52 @@ +# Aprendizagem por Reforço + +Carga Horária: 80 horas-aula + +## Prerequisites + +1. Proficiência em Python. +1. Conhecimento básico em Aprendizagem de Máquina. + +## Ementa + +Aprendizagem por Reforço. Algoritmos de Aprendizagem por Reforço. Implementação de agentes autônomos usando aprendizagem por reforço. + +## Objetivos + +Ao final da disciplina o estudante será capaz de: + +1. Construir um sistema baseado em aprendizagem por reforço para tomada de decisões sequenciais. +1. Compreender como se deve formalizar uma tarefa considerando um problema de aprendizagem por reforço e como implementar uma solução. +1. Compreender os tipos de algoritmos de aprendizagem por reforço: temporal-difference learning, Q-learning, Sarsa, Policy Gradients e outros. +1. Compreender qual é a relação de aprendizagem por reforço com aprendizagem supervisionada e não-supervisionada. +supervised learning. + +## Conteúdo Programático + +1. Introdução ao Aprendizado por Reforço. +1. Implementação de agentes autônomos usando aprendizagem por reforço. +1. Temporal-Difference learning. +1. Algoritmo Q-Learning. +1. Algoritmo Sarsa. +1. Policy Gradients and Proximal Policy Optimization (PPO). +1. Algoritmos do tipo Deep Q-Learning. +1. Implementações de agentes autônomos usando o projeto Gym da OpenAI e a biblioteca para reinforcement learning do Kaggle. +1. Exemplos de soluções usando aprendizagem por reforço. + +## Bibliografia Básica + +1. GÉRON, A. Hands-on Machine Learning with Scikit-learn, Keras, and TensorFlow, 2ª ed., O'Reilly, 2021. +1. SUTTON, R.; BARTO, A. Reinforcement Learning: An Introduction. Second Edition. The MIT Press, 2018. +1. Van Hasselt, H., Guez, A. and Silver, D., 2016, March. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1). +1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. +1. Brockman, G. et al., 2016. Openai gym. arXiv preprint arXiv:1606.01540. + +## Bibliografia Complementar + +1. NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013. +1. SILVER, D.; SINGH S.; PRECUP D.; SUTTON R. [Reward is enough](https://doi.org/10.1016/j.artint.2021.103535). Artificial Intelligence. Vol 299, 2021. +1. [MuZero: Mastering Go, chess, shogi and Atari without rules](https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules). Publicado em Dezembro, 2020. +1. SILVER, D.; HUBERT T.; SCHRITTWIESER, J.; ANTONOGLOU, I.; LAI, M.; GUEZ, A. [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://doi.org/10.1126/science.aar6404). Science 362, 1140-1144 (2018). +1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. + +