You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As far as I know there is no concept of window in H2O. Shouldn't the entire query_states matrix be considered for attn_weights computation? Why are you only snipping out the window_size part from the query states to be considered for matrix multiplication here?
The text was updated successfully, but these errors were encountered:
Thank you for pointing out!
The current code is inconsistent with standard H2O as you mentioned. We have tested performance with or without the entire query_state to calculate the attention. We found that this has very limited influence on the performance. We will address this inconsistency in the updated code.
In the
update_kv
function ofH2OKVCluster
class, I see this code.As far as I know there is no concept of window in H2O. Shouldn't the entire
query_states
matrix be considered forattn_weights
computation? Why are you only snipping out thewindow_size
part from the query states to be considered for matrix multiplication here?The text was updated successfully, but these errors were encountered: