Skip to content

Commit

Permalink
Update batching explanation in docs (#36)
Browse files Browse the repository at this point in the history
* Update embeddings documentation to be more clear

Update embeddings documentation to be more clear

* Update README.md

* Fix markdown formatting

* Update README.md
  • Loading branch information
matthewkotila authored Aug 13, 2024
1 parent e1455e0 commit a812e25
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 0 deletions.
7 changes: 7 additions & 0 deletions genai-perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,13 @@ You can optionally set additional model inputs with the following option:
model with a singular value, such as `stream:true` or `max_tokens:5`. This
flag can be repeated to supply multiple extra inputs.

For [Large Language Models](docs/tutorial.md), there is no batch size (i.e.
batch size is always `1`). Each request includes the inputs for one individual
inference. Other modes such as the [embeddings](docs/embeddings.md) and
[rankings](docs/rankings.md) endpoints support client-side batching, where
`--batch-size N` means that each request sent will include the inputs for `N`
separate inferences, allowing them to be processed together.

</br>

<!--
Expand Down
12 changes: 12 additions & 0 deletions genai-perf/docs/embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,18 @@ genai-perf profile \
--input-file embeddings.jsonl
```

* `-m intfloat/e5-mistral-7b-instruct` is to specify what model you want to run
(`intfloat/e5-mistral-7b-instruct`)
* `--service-kind openai` is to specify that the server type is OpenAI-API
compatible
* `--endpoint-type embeddings` is to specify that the sent requests should be
formatted to follow the [embeddings
API](https://platform.openai.com/docs/api-reference/embeddings/create)
* `--batch-size 2` is to specify that each request will contain the inputs for 2
individual inferences, making a batch size of 2
* `--input-file embeddings.jsonl` is to specify the input data to be used for
inferencing

This will use default values for optional arguments. You can also pass in
additional arguments with the `--extra-inputs` [flag](../README.md#input-options).
For example, you could use this command:
Expand Down

0 comments on commit a812e25

Please sign in to comment.