A problem in PPOAgent #6

sailxjx · 2021-04-28T08:03:20Z

Hi, Rokas:

First of all thanks for your great tutorial on reinforcement learning, I went through all the series and learned a lot.

In the PPOAgent I think there may be something wrong with this line. When I vstack the discounted_r (shape of (n,1)) and subtract it with predicted values (shape of (n,)), the advantages become shape of (n,n). So I think maybe we should not vstack discounted_r, but vstack the advantages in this line advantages = np.vstack(discounted_r - values), then the advantages are shape of (n,1), which is the expected result.

Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A problem in PPOAgent #6

A problem in PPOAgent #6

sailxjx commented Apr 28, 2021

A problem in PPOAgent #6

A problem in PPOAgent #6

Comments

sailxjx commented Apr 28, 2021