Add stochastic muzero #78

ipsec · 2024-05-15T17:40:17Z

What?

Added minimal support to stochastic muzero by issue #77.

Why?

To be able to train stochastic environments like 2048, poker, ...

How?

Added Afterstate and Encoder models with configurations to be able to run it.
Only MLP models are created, not CNN.

Fixes necessary

In the loss function from in the last commit the encoder must to receive an Observation like describe in the paper:

And here:

The pseudocode too in the line 931

I don't know how to get the observation and to pass to the encoder in your code.

In the pseudocode the value target are calculated in every unroll step, lines 910 and 948

I don't know how to this in your code.

Added basic types.

Completed decision and chance recurrent functions.

ff_stochastic_mz.py running (see next steps below). mctx stochastic_muzero_policy working. Next steps to create: 1. The encoder model; 2. The stochastic muzero loss function.

EdanToledo · 2024-05-15T22:11:12Z

Amazing, thanks so much for doing this. I will review the code and do the necessary modifications as soon as i can. It will most likely have to be after next week.

ipsec and others added 7 commits May 9, 2024 19:36

Creating stochastic muzero version using muzero base.

ceaec04

Added basic types.

Created SMZParams to include afterstates params.

c535f82

Completed decision and chance recurrent functions.

Using muzero network configuration to stochastic version

f04b1f6

Minimal stochastic muzero configs added.

777b119

Models created and configurated.

1c5d4cb

ff_stochastic_mz.py running (see next steps below). mctx stochastic_muzero_policy working. Next steps to create: 1. The encoder model; 2. The stochastic muzero loss function.

Encoder Model created.

fd47d39

Loss function.

ba1f703

ipsec mentioned this pull request May 15, 2024

[FEATURE] Add stochastic muzero implementation #77

Open

EdanToledo linked an issue Jun 15, 2024 that may be closed by this pull request

[FEATURE] Add stochastic muzero implementation #77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stochastic muzero #78

Add stochastic muzero #78

ipsec commented May 15, 2024

EdanToledo commented May 15, 2024

Add stochastic muzero #78

Are you sure you want to change the base?

Add stochastic muzero #78

Conversation

ipsec commented May 15, 2024

What?

Why?

How?

Fixes necessary

EdanToledo commented May 15, 2024