You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thanks for your great tutorial on reinforcement learning, I went through all the series and learned a lot.
In the PPOAgent I think there may be something wrong with this line. When I vstack the discounted_r (shape of (n,1)) and subtract it with predicted values (shape of (n,)), the advantages become shape of (n,n). So I think maybe we should not vstack discounted_r, but vstack the advantages in this lineadvantages = np.vstack(discounted_r - values), then the advantages are shape of (n,1), which is the expected result.
Thanks.
The text was updated successfully, but these errors were encountered:
Hi, Rokas:
First of all thanks for your great tutorial on reinforcement learning, I went through all the series and learned a lot.
In the PPOAgent I think there may be something wrong with this line. When I
vstack
the discounted_r (shape of (n,1)) and subtract it with predicted values (shape of (n,)), the advantages become shape of (n,n). So I think maybe we should notvstack
discounted_r, butvstack
the advantages in this lineadvantages = np.vstack(discounted_r - values)
, then the advantages are shape of (n,1), which is the expected result.Thanks.
The text was updated successfully, but these errors were encountered: