diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..faa1838
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,2 @@
+.vscode
+
diff --git a/README.md b/README.md
index c6e7be1..75b131b 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,40 @@
-# rl
-Reinforcement Learning subject
+# Reinforcement Learning 
+
+This repository has the Reinforcement Learning subject material. 
+
+## Offerings
+
+* 2023/1 - Fabrício Barth
+
+## How to setup the environment
+
+```bash
+python3.7 -m virtualenv venv
+source venv/bin/activate
+python -m pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+## How to compile slides
+
+```bash
+pandoc -t beamer slides.md -o slides.pdf
+```
+
+## How to deploy the web page
+
+```bash
+mkdocs gh-deploy
+```
+
+## How to run the web server locally
+
+```bash
+mkdocs serve
+```
+
+## How to publish the lessons plan
+
+```bash
+python publish_lessons_plan.py
+```
diff --git a/docs/_snippets/plan.md b/docs/_snippets/plan.md
new file mode 100644
index 0000000..95b62f7
--- /dev/null
+++ b/docs/_snippets/plan.md
@@ -0,0 +1,36 @@
+| Date                | Content                                                                 |
+|:--------------------|:------------------------------------------------------------------------|
+| 2023-02-07 00:00:00 | Introduction to Reinforcement Learning                                  |
+| 2023-02-09 00:00:00 | Problem-solving searching review                                        |
+| 2023-02-14 00:00:00 | Adversarial search and games review                                     |
+| 2023-02-16 00:00:00 | Reinforcement Learning Tooling and Environments                         |
+| 2023-02-23 00:00:00 | Q-Learning Algorithm                                                    |
+| 2023-02-28 00:00:00 | Q-Learning Algorithm                                                    |
+| 2023-03-02 00:00:00 | SARSA Algorithm                                                         |
+| 2023-03-07 00:00:00 | How to evaluate the performance of an agent?                            |
+| 2023-03-09 00:00:00 | Using RL in non-deterministic environments                              |
+| 2023-03-14 00:00:00 | Using RL in a competitive environment                                   |
+| 2023-03-16 00:00:00 | Using RL in a competitive environment with random behavior              |
+| 2023-03-21 00:00:00 | Implementing an agent to deal with an environment a little more complex |
+| 2023-03-23 00:00:00 | Deep Neural Networks review                                             |
+| 2023-03-28 00:00:00 | Deep Neural Networks review                                             |
+| 2023-03-30 00:00:00 | Midterm assessment - we do not have class                               |
+| 2023-04-04 00:00:00 | Midterm assessment - we do not have class                               |
+| 2023-04-06 00:00:00 | We do not have class                                                    |
+| 2023-04-11 00:00:00 | Neural Network Policies                                                 |
+| 2023-04-13 00:00:00 | Deep Q-Learning                                                         |
+| 2023-04-18 00:00:00 | Deep Q-Learning                                                         |
+| 2023-04-20 00:00:00 | Double Deep Q-Learning                                                  |
+| 2023-04-25 00:00:00 | Double Deep Q-Learning                                                  |
+| 2023-04-27 00:00:00 | Policy Optimization Algorithms (PPO)                                    |
+| 2023-05-02 00:00:00 | Policy Optimization Algorithms (PPO)                                    |
+| 2023-05-04 00:00:00 | Implementation of RL using TF-Agents                                    |
+| 2023-05-09 00:00:00 | Implementation of RL using TF-Agents                                    |
+| 2023-05-11 00:00:00 | Final Project                                                           |
+| 2023-05-16 00:00:00 | Final Project                                                           |
+| 2023-05-18 00:00:00 | Final Project                                                           |
+| 2023-05-23 00:00:00 | Final Project                                                           |
+| 2023-05-25 00:00:00 | Final Project                                                           |
+| 2023-05-30 00:00:00 | Final Project                                                           |
+| 2023-06-01 00:00:00 | Final Assessment - we do not have class                                 |
+| 2023-06-06 00:00:00 | Final Assessment - we do not have class                                 |
\ No newline at end of file
diff --git a/docs/assessment.md b/docs/assessment.md
new file mode 100644
index 0000000..b0eb94b
--- /dev/null
+++ b/docs/assessment.md
@@ -0,0 +1 @@
+# Student Assessment
diff --git a/docs/classes/01_introduction/index.md b/docs/classes/01_introduction/index.md
new file mode 100644
index 0000000..d40aadd
--- /dev/null
+++ b/docs/classes/01_introduction/index.md
@@ -0,0 +1,19 @@
+# Introduction to Reinforcement Learning
+
+1. Definition and key concepts
+1. Differences with other machine learning techniques
+1. Real-world applications
+
+1. How will this subject work?
+	1. Requirements
+	1. This is a hands-on subject! 
+	1. Content
+	1. Assignments 
+
+## Activities for the next class
+
+1. Read the chapter "II Problem-solving" from AIMA book or search on the internet about problem-solving searching and algorithms.
+
+## References
+
+* xxxx
\ No newline at end of file
diff --git a/docs/classes/02_problem_solving/index.md b/docs/classes/02_problem_solving/index.md
new file mode 100644
index 0000000..9366110
--- /dev/null
+++ b/docs/classes/02_problem_solving/index.md
@@ -0,0 +1,4 @@
+# Problem-solving searching review
+
+1. Problem-solving searching review
+1. Exercise: the implementation of a taxi driver agent
diff --git a/docs/classes/03_games/index.md b/docs/classes/03_games/index.md
new file mode 100644
index 0000000..4e3ddec
--- /dev/null
+++ b/docs/classes/03_games/index.md
@@ -0,0 +1,4 @@
+# Adversarial search and games review
+
+1. Adversarial search and games review
+1. Exercise: the implementation of a tic-tac-toe player. 
diff --git a/docs/classes/04_toolings_envs/index.md b/docs/classes/04_toolings_envs/index.md
new file mode 100644
index 0000000..cd1964a
--- /dev/null
+++ b/docs/classes/04_toolings_envs/index.md
@@ -0,0 +1,6 @@
+# Reinforcement Learning Tooling and Environments
+
+1. [The Farama Foundation](https://farama.org/Announcing-The-Farama-Foundation)
+1. Other tools and environments.
+1. How to use [Gymnasium API](https://gymnasium.farama.org/).
+1. Playing with Gymnasium API. 
diff --git a/docs/classes/05_q_learning/index.md b/docs/classes/05_q_learning/index.md
new file mode 100644
index 0000000..61919b1
--- /dev/null
+++ b/docs/classes/05_q_learning/index.md
@@ -0,0 +1,5 @@
+# Q-Learning Algorithm
+    
+1. Definition and key concepts
+1. Implementation
+
diff --git a/docs/classes/07_sarsa/index.md b/docs/classes/07_sarsa/index.md
new file mode 100644
index 0000000..3fc027f
--- /dev/null
+++ b/docs/classes/07_sarsa/index.md
@@ -0,0 +1,7 @@
+# SARSA Algorithm
+    
+1. Definition and key concepts
+1. The main difference between Q-Learning and SARSA
+1. Implementation
+
+
diff --git a/docs/classes/08_evaluation/index.md b/docs/classes/08_evaluation/index.md
new file mode 100644
index 0000000..660a3bd
--- /dev/null
+++ b/docs/classes/08_evaluation/index.md
@@ -0,0 +1,5 @@
+# How to evaluate the performance of an agent?
+    
+1. Metrics
+1. How to summarize results
+1. Exercise: compare Q-Learning and SARSA algorithms considering a deterministic environment
\ No newline at end of file
diff --git a/docs/classes/09_non_determ/index.md b/docs/classes/09_non_determ/index.md
new file mode 100644
index 0000000..0411122
--- /dev/null
+++ b/docs/classes/09_non_determ/index.md
@@ -0,0 +1,3 @@
+# Using RL in non-deterministic environments
+    
+1. Exercise: implement two agents to the Frozen Lake problem using Q-Learning and Sarsa algorithms and compare the results
diff --git a/docs/classes/10_game_env/index.md b/docs/classes/10_game_env/index.md
new file mode 100644
index 0000000..175c9a7
--- /dev/null
+++ b/docs/classes/10_game_env/index.md
@@ -0,0 +1,4 @@
+# Using RL in a competitive environment
+    
+1. Exercise: implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.
+
diff --git a/docs/classes/11_game_env_random/index.md b/docs/classes/11_game_env_random/index.md
new file mode 100644
index 0000000..84c6f1d
--- /dev/null
+++ b/docs/classes/11_game_env_random/index.md
@@ -0,0 +1,4 @@
+# Using RL in a competitive environment with random behavior
+
+1. Exercise: implement an agent to play Blackjack using Q-Learning or Sarsa algorithms and show the results.
+
diff --git a/docs/classes/12_more_complex/index.md b/docs/classes/12_more_complex/index.md
new file mode 100644
index 0000000..9e95fa1
--- /dev/null
+++ b/docs/classes/12_more_complex/index.md
@@ -0,0 +1,4 @@
+# Implementing an agent to deal with an environment a little more complex
+
+1. Exercise: implement an agent to run a mountain car. 
+1. Discussion: how we can implement agents using RL for environments like LunarLander, Atari, and others?
\ No newline at end of file
diff --git a/docs/classes/13_nn_review/index.md b/docs/classes/13_nn_review/index.md
new file mode 100644
index 0000000..5a5d26a
--- /dev/null
+++ b/docs/classes/13_nn_review/index.md
@@ -0,0 +1,5 @@
+# Deep Neural Networks review
+    
+1. Neural Networks
+1. Gradient descent and optimization
+1. Exercise: implement a neural network.
\ No newline at end of file
diff --git a/docs/classes/14_nn_policies/index.md b/docs/classes/14_nn_policies/index.md
new file mode 100644
index 0000000..835df2f
--- /dev/null
+++ b/docs/classes/14_nn_policies/index.md
@@ -0,0 +1,5 @@
+# Neural Network Policies
+
+1. Policy Gradients
+1. Exercise: implement a neural network policy
+
diff --git a/docs/classes/15_deep_q_learning/index.md b/docs/classes/15_deep_q_learning/index.md
new file mode 100644
index 0000000..e906f7b
--- /dev/null
+++ b/docs/classes/15_deep_q_learning/index.md
@@ -0,0 +1,5 @@
+# Deep Q-Learning 
+
+1. Definitions and key concepts
+1. Deep Q-Learning implementation
+1. Exercise: implement a Lunar Lander agent using DDQ
\ No newline at end of file
diff --git a/docs/classes/16_double_deep_q_learning/index.md b/docs/classes/16_double_deep_q_learning/index.md
new file mode 100644
index 0000000..31f59c9
--- /dev/null
+++ b/docs/classes/16_double_deep_q_learning/index.md
@@ -0,0 +1,5 @@
+# Double Deep Q-Learning
+
+1. Definitions and key concepts
+1. What are the differences between Deep Q-Learning and Double Deep Q-Learning
+1. Exercise: implement a Double Deep Q-Learning and compare the results with Deep Q-Learning
diff --git a/docs/classes/17_ppo/index.md b/docs/classes/17_ppo/index.md
new file mode 100644
index 0000000..8ea73f5
--- /dev/null
+++ b/docs/classes/17_ppo/index.md
@@ -0,0 +1,5 @@
+# Policy Optimization Algorithms (PPO)
+    
+1. Definitions and key concepts
+1. Implementation
+
diff --git a/docs/classes/18_tf_agents/index.md b/docs/classes/18_tf_agents/index.md
new file mode 100644
index 0000000..e86642e
--- /dev/null
+++ b/docs/classes/18_tf_agents/index.md
@@ -0,0 +1,7 @@
+# Implementation of RL using TF-Agents 
+
+TBD
+
+## References
+
+* [TF-Agents](https://www.tensorflow.org/agents)
\ No newline at end of file
diff --git a/docs/classes/19_final_project/index.md b/docs/classes/19_final_project/index.md
new file mode 100644
index 0000000..216f641
--- /dev/null
+++ b/docs/classes/19_final_project/index.md
@@ -0,0 +1,3 @@
+# Final Project
+
+TBD
\ No newline at end of file
diff --git a/docs/css/custom.css b/docs/css/custom.css
new file mode 100644
index 0000000..0022aa8
--- /dev/null
+++ b/docs/css/custom.css
@@ -0,0 +1,19 @@
+#alunos ~ table td, #avaliacao ~ table td {
+    vertical-align: middle;
+}
+
+
+img.event-picture {
+    width: 40%;
+    height: 200px;
+    display: inline-block;
+    object-fit: cover;
+}
+
+.skill-icon > svg {
+    max-width: 40px !important;
+    max-height: 40px !important;
+
+    width: 40px !important;
+    height: 40px !important;
+}
\ No newline at end of file
diff --git a/docs/css/github.png b/docs/css/github.png
new file mode 100644
index 0000000..8b25551
Binary files /dev/null and b/docs/css/github.png differ
diff --git a/docs/goals.md b/docs/goals.md
new file mode 100644
index 0000000..d05e8d0
--- /dev/null
+++ b/docs/goals.md
@@ -0,0 +1,8 @@
+# Learning Goals
+
+At the end of the course, the student should be able to:
+
+1. Build a Reinforcement Learning system for sequential decision-making.
+1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution.
+1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more).
+1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning.
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..c32ff3a
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,17 @@
+# Reinforcement Learning - 2023/1
+
+1. [Learning Goals](goals.md)
+2. [Plan](plan.md)
+3. [Student Assessment](assessment.md)
+
+## Class Schedule
+
+Tuesday and Thursday from 3:45 PM until 5:45 PM. 
+
+## Extra period
+
+Thursday from 12 AM until 1:30 PM.
+
+## Contact information
+
+If you have any questions or comments, please, send an e-mail to fabriciojb at insper dot edu dot br. 
\ No newline at end of file
diff --git a/docs/plan.md b/docs/plan.md
new file mode 100644
index 0000000..1f4d395
--- /dev/null
+++ b/docs/plan.md
@@ -0,0 +1,5 @@
+# Plan
+
+The following activities are planned. The program is always subject to changes and adaptations as the discipline is performed.
+
+--8<-- "plan.md"
\ No newline at end of file
diff --git a/lessons_plan.xlsx b/lessons_plan.xlsx
new file mode 100644
index 0000000..ed2fce9
Binary files /dev/null and b/lessons_plan.xlsx differ
diff --git a/mkdocs.yml b/mkdocs.yml
new file mode 100644
index 0000000..61e22e7
--- /dev/null
+++ b/mkdocs.yml
@@ -0,0 +1,76 @@
+site_name: Reinforcement Learning
+repo_url: https://github.com/Insper/rl/
+repo_name: Reinforcement Learning
+site_url: https://insper.github.io/rl/
+
+theme:
+    name: 'material'
+
+extra_css:
+  - css/custom.css
+
+nav:
+  - 'Home': 'index.md'
+  - 'Goals': 'goals.md'
+  - 'Plan': 'plan.md'
+  - 'Student Assessment': 'assessment.md'
+  - 'Classes':
+    - 'classes/01_introduction/index.md'
+    - 'classes/02_problem_solving/index.md'
+    - 'classes/03_games/index.md'
+    - 'classes/04_toolings_envs/index.md'
+    - 'classes/05_q_learning/index.md'
+    - 'classes/07_sarsa/index.md'
+    - 'classes/08_evaluation/index.md'
+    - 'classes/09_non_determ/index.md'
+    - 'classes/10_game_env/index.md'
+    - 'classes/11_game_env_random/index.md'
+    - 'classes/12_more_complex/index.md'
+    - 'classes/13_nn_review/index.md'
+    - 'classes/14_nn_policies/index.md'
+    - 'classes/15_deep_q_learning/index.md'
+    - 'classes/16_double_deep_q_learning/index.md'
+    - 'classes/17_ppo/index.md'
+    - 'classes/18_tf_agents/index.md'
+    - 'classes/19_final_project/index.md'
+
+
+extra_javascript:
+  - https://cdnjs.cloudflare.com/ajax/libs/js-yaml/4.0.0/js-yaml.min.js
+  - js/markdown-enhancer.js
+  - javascripts/mathjax.js
+  - https://polyfill.io/v3/polyfill.min.js?features=es6
+  - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
+
+markdown_extensions:
+  - pymdownx.arithmatex:
+      generic: true
+  - attr_list
+  - markdown.extensions.admonition
+  - pymdownx.tasklist:
+      custom_checkbox: true
+  - pymdownx.details
+  - pymdownx.tabbed
+  - pymdownx.superfences
+  - pymdownx.magiclink
+  - pymdownx.critic:
+      mode: view
+  - pymdownx.betterem:
+      smart_enable: all
+  - pymdownx.caret
+  - pymdownx.mark
+  - pymdownx.tilde
+  - pymdownx.smartsymbols
+  - pymdownx.snippets:
+      base_path: "docs/_snippets"
+      check_paths: true
+  - pymdownx.emoji:
+      emoji_index: !!python/name:materialx.emoji.twemoji
+      emoji_generator: !!python/name:materialx.emoji.to_svg
+  - footnotes
+
+plugins: 
+  - git-revision-date-localized
+
+
+
diff --git a/publish_lessons_plan.py b/publish_lessons_plan.py
new file mode 100644
index 0000000..a7abb35
--- /dev/null
+++ b/publish_lessons_plan.py
@@ -0,0 +1,9 @@
+import tabulate
+import pandas as pd
+
+t1 = pd.read_excel('lessons_plan.xlsx')
+#t1['Data'] = t1['Data'].apply(lambda x: x.strftime('%d/%m'))
+
+with open('docs/_snippets/plan.md', 'w') as f:
+    tabela_str = tabulate.tabulate(t1[['Date', 'Content']], headers=['Date', 'Content'], tablefmt='pipe', showindex=False)
+    f.write(tabela_str)
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..a0027af
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,10 @@
+mkdocs-material
+mkdocs-git-revision-date-localized-plugin
+markdown
+pymdown-extensions
+tabulate
+requests
+pandas
+openpyxl
+pytest
+pylint
\ No newline at end of file
diff --git a/rl_ementa_en.md b/rl_ementa_en.md
new file mode 100644
index 0000000..17ac165
--- /dev/null
+++ b/rl_ementa_en.md
@@ -0,0 +1,50 @@
+# Reinforcement Learning
+
+Course load: 80
+
+## Prerequisites
+
+1. Proficiency in Python.
+1. Basic machine learning knowledge.
+
+## Syllabus
+
+Reinforcement Learning (RL). RL Algorithms. How to build a reinforcement learning solution. 
+
+## Learning goals
+
+At the end of the course, the student should be able to:
+
+1. Build a Reinforcement Learning system for sequential decision-making.
+1. Understand how to formalize your task as a Reinforcement Learning problem, and how to implement a solution.
+1. Understand the space of RL algorithms (Sarsa, Q-learning, Policy Gradients, and more).
+1. Understand how RL fits under the broader umbrella of machine learning, and how it complements supervised and unsupervised learning. 
+
+## Detailed Syllabus
+
+1. Introduction to Reinforcement Learning.
+1. Implementation of autonomous agents using reinforcement learning. 
+1. Temporal-Difference learning.
+1. Q-Learning algorithm.
+1. Sarsa algorithm. 
+1. Policy Gradients and Proximal Policy Optimization (PPO).
+1. Deep Q-Learning algorithms. 
+1. Implementations of autonomous agents using OpenAI's Gym project and Kaggle's library for RL.
+1. Reinforcement learning use cases.  
+
+## Basic Bibliography
+
+1. GÉRON, A. Hands-on Machine Learning with Scikit-learn, Keras, and TensorFlow, 2ª ed., O'Reilly, 2021.
+1. SUTTON, R.; BARTO, A. Reinforcement Learning: An Introduction. Second Edition. The MIT Press, 2018.	
+1. Van Hasselt, H., Guez, A. and Silver, D., 2016, March. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
+1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
+1. Brockman, G. et al., 2016. Openai gym. arXiv preprint arXiv:1606.01540.
+
+
+## Supplementary Bibliography
+
+1. NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013.
+1. SILVER, D.; SINGH S.; PRECUP D.; SUTTON R. [Reward is enough](https://doi.org/10.1016/j.artint.2021.103535). Artificial Intelligence. Vol 299, 2021.
+1. [MuZero: Mastering Go, chess, shogi and Atari without rules](https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules). Publicado em Dezembro, 2020.
+1. SILVER, D.; HUBERT T.; SCHRITTWIESER, J.; ANTONOGLOU, I.; LAI, M.; GUEZ, A. [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://doi.org/10.1126/science.aar6404). Science 362, 1140-1144 (2018).
+1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
diff --git a/rl_ementa_pt.md b/rl_ementa_pt.md
new file mode 100644
index 0000000..14daec4
--- /dev/null
+++ b/rl_ementa_pt.md
@@ -0,0 +1,52 @@
+# Aprendizagem por Reforço
+
+Carga Horária:  80 horas-aula
+
+## Prerequisites
+
+1. Proficiência em Python.
+1. Conhecimento básico em Aprendizagem de Máquina.
+
+## Ementa
+
+Aprendizagem por Reforço. Algoritmos de Aprendizagem por Reforço. Implementação de agentes autônomos usando aprendizagem por reforço. 
+
+## Objetivos
+
+Ao final da disciplina o estudante será capaz de:
+
+1. Construir um sistema baseado em aprendizagem por reforço para tomada de decisões sequenciais. 
+1. Compreender como se deve formalizar uma tarefa considerando um problema de aprendizagem por reforço e como implementar uma solução.
+1. Compreender os tipos de algoritmos de aprendizagem por reforço: temporal-difference learning, Q-learning, Sarsa, Policy Gradients e outros. 
+1. Compreender qual é a relação de aprendizagem por reforço com aprendizagem supervisionada e não-supervisionada. 
+supervised learning. 
+
+## Conteúdo Programático
+
+1. Introdução ao Aprendizado por Reforço.
+1. Implementação de agentes autônomos usando aprendizagem por reforço. 
+1. Temporal-Difference learning.
+1. Algoritmo Q-Learning.
+1. Algoritmo Sarsa.
+1. Policy Gradients and Proximal Policy Optimization (PPO).
+1. Algoritmos do tipo Deep Q-Learning. 
+1. Implementações de agentes autônomos usando o projeto Gym da OpenAI e a biblioteca para reinforcement learning do Kaggle.
+1. Exemplos de soluções usando aprendizagem por reforço.
+
+## Bibliografia Básica
+
+1. GÉRON, A. Hands-on Machine Learning with Scikit-learn, Keras, and TensorFlow, 2ª ed., O'Reilly, 2021.
+1. SUTTON, R.; BARTO, A. Reinforcement Learning: An Introduction. Second Edition. The MIT Press, 2018.	
+1. Van Hasselt, H., Guez, A. and Silver, D., 2016, March. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
+1. Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
+1. Brockman, G. et al., 2016. Openai gym. arXiv preprint arXiv:1606.01540.
+
+## Bibliografia Complementar
+
+1. NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013.
+1. SILVER, D.; SINGH S.; PRECUP D.; SUTTON R. [Reward is enough](https://doi.org/10.1016/j.artint.2021.103535). Artificial Intelligence. Vol 299, 2021.
+1. [MuZero: Mastering Go, chess, shogi and Atari without rules](https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rules). Publicado em Dezembro, 2020.
+1. SILVER, D.; HUBERT T.; SCHRITTWIESER, J.; ANTONOGLOU, I.; LAI, M.; GUEZ, A. [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://doi.org/10.1126/science.aar6404). Science 362, 1140-1144 (2018).
+1. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
+
+