question about kv_cache #196

hrz2000 · 2025-01-20T16:38:11Z

Line 106 in 8d1647c

return self.key_cache[layer_idx], self.value_cache[layer_idx]

Why are timestep and image_latents (collectively referred to as x) not kept during recording kv_cache, but only condition c? This makes the first time step of inference not perform self-attention of x on x, but only cross-attention of x on c.

staoxiao · 2025-01-24T07:13:25Z

Hi, @hrz2000 , thank you for pointing out this issue. It appears to be a bug in the inference process, and I will carefully check this part of the code later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about kv_cache #196

question about kv_cache #196

hrz2000 commented Jan 20, 2025

staoxiao commented Jan 24, 2025

question about kv_cache #196

question about kv_cache #196

Comments

hrz2000 commented Jan 20, 2025

staoxiao commented Jan 24, 2025