Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stochastic muzero #78

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add stochastic muzero #78

wants to merge 7 commits into from

Conversation

ipsec
Copy link

@ipsec ipsec commented May 15, 2024

What?

Added minimal support to stochastic muzero by issue #77.

Why?

To be able to train stochastic environments like 2048, poker, ...

How?

Added Afterstate and Encoder models with configurations to be able to run it.
Only MLP models are created, not CNN.

Fixes necessary

  1. In the loss function from in the last commit the encoder must to receive an Observation like describe in the paper:
image

And here:

image

The pseudocode too in the line 931

I don't know how to get the observation and to pass to the encoder in your code.

  1. In the pseudocode the value target are calculated in every unroll step, lines 910 and 948

I don't know how to this in your code.

ipsec and others added 7 commits May 9, 2024 19:36
Completed decision and chance recurrent functions.
ff_stochastic_mz.py running (see next steps below).
mctx stochastic_muzero_policy working.
Next steps to create:
1. The encoder model;
2. The stochastic muzero loss function.
@EdanToledo
Copy link
Owner

Amazing, thanks so much for doing this. I will review the code and do the necessary modifications as soon as i can. It will most likely have to be after next week.

@EdanToledo EdanToledo linked an issue Jun 15, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add stochastic muzero implementation
2 participants