Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
LLM caching: Bug when sharing prefix in target (#1048)
When caching labels we're assuming that each encoded label has an EOS token. This is not given with every tokenizer. For example the GPT2 tokenizer doesn't do this. Without the EOS token labels with shared prefixes, e.g. '11' and '11111' (= '11' + '111'), will both have cache entries for the shared prefix '11' but will have different total label lengths (in this case 1 vs. 2 tokens). This then leads to the scenario that, when generating logits for label '11' we will have a 'next' cache entry (for '111') but no more label left. The code only checks for the EOS token (which is not present) and we run into an index error. The solution is, in this case, to also check if the label we want logits for is already completely checked.
- Loading branch information