Does each token requires KNN search during inference? #7

noanti · 2023-07-11T03:22:56Z

If i use faiss as a Memory, during the inference，calculating each token requires 3(becase there are 3 memory attention layers) knn search, right? Will the generation speed become very slow?

noanti · 2023-07-14T17:05:41Z

@CStanKonrad Is there a practical example that using external Memory?

CStanKonrad · 2023-07-15T12:11:20Z

Regarding the question, the suggested implementation of kNN retrieves for each query in the memory layer k most matching keys from the memory cache. In the 3B model, there are 3 memory layers, each having 32 heads, which gives 96 retrievals per token. In general, we recommend using the brute force approach (full attention - no kNN; an example of such an approach is implemented in this repository) for memories that fit on GPU. However, if you want to use Faiss you will need to tune the index manually (note that the faster Faiss indexes have a training stage and allow to balance between speed and retrieval accuracy). We currently do not provide practical examples with Faiss.

Example times obtained on 40GB A100 GPU with bfloat16 precision using code from this repository
(populating of memory cache takes around 17s in this case):
process 64k tokens, then generate 100 tokens: ~23s
process 64k tokens, then generate 200 tokens: ~29s
process 64k tokens, then generate 300 tokens ~36s
process 64k tokens, then generate 400 tokens ~43s
process 64k tokens, then generate 500 tokens ~50s
So in case of 64k memory generation of one token is <= 0.07s (note that if you generate a lot, this time will increase as memory will increase during generation)

noanti · 2023-07-19T04:57:41Z

Got it, thanks!

CStanKonrad mentioned this issue Jul 15, 2023

How's the speed droping when length get large compare with vanilla llama? #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does each token requires KNN search during inference? #7

Does each token requires KNN search during inference? #7

noanti commented Jul 11, 2023

noanti commented Jul 14, 2023

CStanKonrad commented Jul 15, 2023

noanti commented Jul 19, 2023

Does each token requires KNN search during inference? #7

Does each token requires KNN search during inference? #7

Comments

noanti commented Jul 11, 2023

noanti commented Jul 14, 2023

CStanKonrad commented Jul 15, 2023

noanti commented Jul 19, 2023