Fix legal_actions_mask bug in epsilon_greedy(). #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR addresses Issue #27
Note that the bug must be addressed in two places. First, when selecting
max_value
- it must only be selected from legal actions. Second, when computinggreedy_probs
- there could be multiple action values achieving the max, but not all of them legal.Also, I added
legal_actions_mask
to the list of values in thetf.name_scope
context manager.Aside from that, when there is no legal actions mask the
epsilon_greedy
function should execute exactly the same as before.Happy to implement any small changes and if there’s a better fix altogether feel free to close this and implement it internally. Just thought I’d offer up a solution :)