-
-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel forward #74
Comments
I'm not sure what you mean. Are you saying parallel forward over the 256 image tokens? That wouldn't work because each token depends on the previous token. And if you meant parallel over the layers that wouldn't work either since each layer depends on the previous layer's output. Maybe you meant parallel backward? |
Right now the code can't just do forward over all tokens because of the caching implementation. It needs to run through every token instead of just masking the attention |
Oh I see, it would be for if you wanted to do a forward pass over all tokens at once, instead of sampling one after the other. |
#80 solves this |
The model's decoder right now only supports sequential decoding. This is because of the way
attn_state
is implemented. Parallelgenerationforward pass can be implemented by settingattn_state
toNone
and handling all cases inside generation codeThis would help solve #58
The text was updated successfully, but these errors were encountered: