-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b98c1f7
commit 4812948
Showing
2 changed files
with
9 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -835,4 +835,3 @@ def val_dataloader(self): | |
|
||
|
||
# %% | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,7 @@ | |
r""" | ||
If you have any other suggestions for material or features that you think ARENA could benefit from, please reach out via email `[email protected]` or on the [Slack group](https://join.slack.com/t/arena-la82367/shared_invite/zt-1uvoagohe-JUv9xB7Vr143pdx1UBPrzQ). | ||
***Note** - some of the dates below have been moved backwards recently, because I've become quite busy with other fieldbuilding projects.* | ||
***Note** - some of the dates below have been moved backwards recently, because I've become quite busy with other fieldbuilding projects and working on a research paper. I've unfortunately had to remove most of the fixed date estimates. If there is anything here you're particularly interested in then please reach out, so we know what to prioritise.* | ||
""") | ||
|
||
st.markdown( | ||
|
@@ -47,7 +47,7 @@ | |
This section details all planned changes, i.e. things we're confident will be implemented and which we do have a good idea of the timeline for. They're roughly ordered by priority / release date (i.e. the first ones in this list will be coming first). | ||
<details> | ||
<summary>🗺️ <b>Smoothing the path from PPO to RLHF</b><br>Planned release: <b>July 2023</b></summary> | ||
<summary>🗺️ <b>Smoothing the path from PPO to RLHF</b><br>Planned release: <b>July-August 2023</b></summary> | ||
--- | ||
|
@@ -57,27 +57,27 @@ | |
Some more information about each new section: | ||
#### **PPO (Atari / MuJoCo)** - 9th July 2023 (this has now been released) | ||
#### **PPO (Atari / MuJoCo)** | ||
This will extend the basic PPO material. | ||
The most important conceptual ideas Atari introduces relate to the design of the actor / critic architectures (because the observation space is now an image of pixels rather than four values). We'll use CNNs for our agent, and we'll have shared architecture for the actor and critic (an important idea which reappears when we look at transformers later on). | ||
MuJoCo is more challenging than Atari, because it introduces a **continuous action space**. This will require a redesign of several components of the PPO algorithm. | ||
#### **RLHF (Atari / MuJoCo)** - 23th July 2023 | ||
#### **RLHF (Atari / MuJoCo)** | ||
Practice RLHF in a simple environment, before we get to language models. You'll be able to try RLHF on Atari games, or attempt to replicate results like OpenAI's famous "backflipping noodle" experiment. | ||
#### **PPO (transformers)** - 25th July 2023 | ||
#### **PPO (transformers)** | ||
Now, we move on to transformer architecture. Before we introduce the human feedback element, we'll use the PPO algorithm to train the transformer to do a simple task (generating more periods, i.e. shorter sentences). This will involve some concepts that came up in the PPO Atari material (e.g. the policy and value networks as having shared architecture), as well as some new ideas (e.g. keeping the transformer on-distribution using a KL divergence penalty term). | ||
#### **RLHF (transformers)** - 28th July 2023 | ||
#### **RLHF (transformers)** | ||
This section combines the conceptual ideas from the last 2 sections, by having you implement RLHF on transformers. You'll be doing all the coding from the ground up, i.e. with no assistance from libraries like `trlx`. | ||
#### **RLHF (trlx), re-release** - 1st August 2023 | ||
#### **RLHF (trlx), re-release** | ||
Lastly, we plan to restructure the RLHF & `trlx` chapter, to make it easier to follow & flow better with the rest of the chapter. Learning how to use `trlx` is still valuable, because once you have a conceptual understanding of the RLHF process, there's no need to go through all the low-level implementational details. | ||
|
@@ -90,7 +90,7 @@ | |
We'll start to release a monthly sequence on mechanistic interpretability, in the same vein as Steven Caspar's sequence (they will also be published on LessWrong). Each challenge will involve interpreting a transformer trained on a particular LeetCode problem. The balanced brackets transformer exercises are a good example (although I expect the average task in this sequence to be slightly easier). | ||
**The first one is planned for release on Saturday 15th July.** | ||
**The first one is currently live.** | ||
</details> | ||
|
@@ -125,22 +125,13 @@ | |
</details> | ||
<details> | ||
<summary>📽️ <b>Video walkthroughs of solutions</b><br>Planned release: <b>July - August 2023</b></summary> | ||
--- | ||
Over the next couple of months, video walkthroughs to most of the exercise sets are planned to be released. The mechanistic interpretability chapter will be highest priority; after that we'll work on videos for the RL chapter. | ||
</details> | ||
## Speculative | ||
This section lists a few changes we're thinking about making, but which we aren't certain about yet. | ||
* A chapter on diffusion models: GANs, VAEs, diffusion models, stable diffusion. | ||
* Evals (possibly as a subsection of a chapter, or as its own micro section). | ||
* Video walkthroughs of solutions to exercises (this depends on demand). | ||
""", unsafe_allow_html=True) |