Skip to content

Commit

Permalink
Remove all deploy on baseten buttons
Browse files Browse the repository at this point in the history
  • Loading branch information
relph committed Jan 10, 2024
1 parent 9b2ccdc commit 5663952
Show file tree
Hide file tree
Showing 15 changed files with 154 additions and 159 deletions.
17 changes: 9 additions & 8 deletions alpaca-7b/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/alpaca)

# Alpaca-7B Truss

This is a [Truss](https://truss.baseten.co/) for Alpaca-7B, a fine-tuned variant of LLaMA-7B. LLaMA is a family of language models released by Meta. This README will walk you through how to deploy this Truss on Baseten to get your own instance of Alpaca-7B.

## Deploy Alpaca-7B

First, clone this repository:
Expand All @@ -28,16 +27,18 @@ Paste your Baseten API key if prompted.
For more information, see [Truss documentation](https://truss.baseten.co).

## Alpaca-7B API documentation
This section provides an overview of the Alpaca-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

This section provides an overview of the Alpaca-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

### API route: `predict`

The predict route is the primary method for generating text completions based on a given instruction. It takes several parameters:

- __instruction__: The input text that you want the model to generate a response for.
- __temperature__ (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
- __top_p__ (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
- __top_k__ (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
- __num_beams__ (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.
- **instruction**: The input text that you want the model to generate a response for.
- **temperature** (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
- **top_p** (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
- **top_k** (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
- **num_beams** (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.

The API also supports passing any parameter supported by Huggingface's `Transformers.generate`.

Expand Down
16 changes: 9 additions & 7 deletions deepfloyd-xl/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/deepfloyd)

# DeepFloyd XL Truss

This is a [Truss](https://truss.baseten.co/) for DeepFloyd-IF. DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model that can generate pictures and sets a new state-of-the-art for photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset.
Expand Down Expand Up @@ -50,20 +48,24 @@ For more information, see [Truss documentation](https://truss.baseten.co).

This deployment of DeepFloyd takes a dictionary as input, which requires the following key:

* `prompt` - the prompt for image generation
- `prompt` - the prompt for image generation

It also supports a number of other parameters detailed in [this blog post](https://huggingface.co/blog/if).

### Output

The result will be a dictionary containing:

* `status` - either `success` or `failed`
* `data` - list of base 64 encoded images
* `message` - will contain details in the case of errors
- `status` - either `success` or `failed`
- `data` - list of base 64 encoded images
- `message` - will contain details in the case of errors

```json
{"status": "success", "data": ["/9j/4AAQSkZJRgABAQAAAQABAA...."], "message": null}
{
"status": "success",
"data": ["/9j/4AAQSkZJRgABAQAAAQABAA...."],
"message": null
}
```

## Example usage
Expand Down
23 changes: 11 additions & 12 deletions flan-t5-xl/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,14 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/flan_t5)

# FLAN-T5 XL Truss

[Flan-T5 XL](https://huggingface.co/google/flan-t5-xl?text=Q%3A+%28+False+or+not+False+or+False+%29+is%3F+A%3A+Let%27s+think+step+by+step) is an open-source large language model developed by Google.

Flan-T5 XL has a number of use cases such as:

* Sentiment analysis
* Paraphrasing/sentence similarity
* Natural language inference
* Sentence completion
* Question answering
- Sentiment analysis
- Paraphrasing/sentence similarity
- Natural language inference
- Sentence completion
- Question answering

Flan-T5 XL is similar to T5 except it is "instruction tuned". In practice, this means that the model is comparable to GPT-3 in multitask benchmarks because it is fine-tuned to follow human inputs / instructions.

Expand All @@ -37,14 +35,15 @@ truss push
Paste your Baseten API key if prompted.

For more information, see [Truss documentation](https://truss.baseten.co).

## FLAN-T5 XL API documentation

### Input

The input should be a list of dictionaries and may contain the following key:

* `prompt` - the prompt for text generation
* `bad_words` - an optional list of strings to avoid in the generated output
- `prompt` - the prompt for text generation
- `bad_words` - an optional list of strings to avoid in the generated output

The [official documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate) has information on additional parameters.

Expand All @@ -59,9 +58,9 @@ The [official documentation](https://huggingface.co/docs/transformers/main/en/ma

The result will be a dictionary containing:

* `status` - either `success` or `failed`
* `data` - the output text
* `message` - will contain details in the case of errors
- `status` - either `success` or `failed`
- `data` - the output text
- `message` - will contain details in the case of errors

```
{
Expand Down
11 changes: 5 additions & 6 deletions gfp-gan/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/gfp_gan)

# GFP-GAN Truss

This is a [Truss](https://truss.baseten.co/) for serving an implementation of TencentARC
Expand Down Expand Up @@ -38,15 +36,16 @@ For more information, see [Truss documentation](https://truss.baseten.co).
### Input

The input should be a dictionary with the following key:
* `image` - the image to be restored, encoded as base64.

- `image` - the image to be restored, encoded as base64.

### Output

The model returns a dictionary containing the base64-encoded restored image:
* `status` - either `success` or `failed`
* `data` - the restored image, encoded as base64
* `message` - will contain details in the case of errors

- `status` - either `success` or `failed`
- `data` - the restored image, encoded as base64
- `message` - will contain details in the case of errors

## Example usage

Expand Down
52 changes: 27 additions & 25 deletions gpt-j/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/gpt_j)

# GPT-J Truss

This is an implementation of EleutherAI
Expand Down Expand Up @@ -38,46 +36,50 @@ For more information, see [Truss documentation](https://truss.baseten.co).

The input should be a list of dictionaries and must contain the following key:

* `prompt` - the prompt for text generation
- `prompt` - the prompt for text generation

Additionally; the following optional parameters are supported as pass thru to the `generate` method. For more details, see the [official documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate)

* `max_length` - int - limited to 512
* `min_length` - int - limited to 64
* `do_sample` - bool
* `early_stopping` - bool
* `num_beams` - int
* `temperature` - float
* `top_k` - int
* `top_p` - float
* `repetition_penalty` - float
* `length_penalty` - float
* `encoder_no_repeat_ngram_size` - int
* `num_return_sequences` - int
* `max_time` - float
* `num_beam_groups` - int
* `diversity_penalty` - float
* `remove_invalid_values` - bool
- `max_length` - int - limited to 512
- `min_length` - int - limited to 64
- `do_sample` - bool
- `early_stopping` - bool
- `num_beams` - int
- `temperature` - float
- `top_k` - int
- `top_p` - float
- `repetition_penalty` - float
- `length_penalty` - float
- `encoder_no_repeat_ngram_size` - int
- `num_return_sequences` - int
- `max_time` - float
- `num_beam_groups` - int
- `diversity_penalty` - float
- `remove_invalid_values` - bool

Here's an example input:

```json
{
"prompt": "If I was a billionaire, I would",
"max_length": 50
"prompt": "If I was a billionaire, I would",
"max_length": 50
}
```

### Output

The result will be a dictionary containing:

* `status` - either `success` or `failed`
* `data` - the output text
* `message` - will contain details in the case of errors
- `status` - either `success` or `failed`
- `data` - the output text
- `message` - will contain details in the case of errors

```json
{"status": "success", "data": "If I was a billionaire, I would buy a plane.", "message": null}
{
"status": "success",
"data": "If I was a billionaire, I would buy a plane.",
"message": null
}
```

## Example usage
Expand Down
18 changes: 8 additions & 10 deletions llama/llama-2-7b-trt-llm/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/llama)

# LLaMA2-7B-Chat Truss

This is a [Truss](https://truss.baseten.co/) for an int8 SmoothQuant version of LLaMA2-7B-Chat. Llama is a family of language models released by Meta. This README will walk you through how to deploy this Truss on Baseten to get your own instance of LLaMA2-7B-Chat.
Expand Down Expand Up @@ -35,19 +33,19 @@ Paste your Baseten API key if prompted.
For more information, see [Truss documentation](https://truss.baseten.co).

## LLaMA2-7B API documentation
This section provides an overview of the LLaMA2-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

This section provides an overview of the LLaMA2-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

### API route: `predict`

We expect requests will the following information:


- ```prompt``` (str): The prompt you'd like to complete
- ```max_tokens``` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
- ```beam_width``` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
- ```bad_words_list``` (list, default:[]): A list of words to not include in generated output.
- ```stop_words_list``` (list, default:[]): A list of words to stop generation upon encountering.
- ```repetition_penalty``` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.
- `prompt` (str): The prompt you'd like to complete
- `max_tokens` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
- `beam_width` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
- `bad_words_list` (list, default:[]): A list of words to not include in generated output.
- `stop_words_list` (list, default:[]): A list of words to stop generation upon encountering.
- `repetition_penalty` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.

This Truss will stream responses back. Responses will be buffered chunks of text.

Expand Down
16 changes: 8 additions & 8 deletions llama/llama-7b/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/llama)

# LLaMA-7B Truss

This is a [Truss](https://truss.baseten.co/) for an int8 version of LLaMA-7B. Llama is a family of language models released by Meta. This README will walk you through how to deploy this Truss on Baseten to get your own instance of LLaMA-7B.
Expand Down Expand Up @@ -33,16 +31,18 @@ Paste your Baseten API key if prompted.
For more information, see [Truss documentation](https://truss.baseten.co).

## LLaMA-7B API documentation
This section provides an overview of the LLaMA-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

This section provides an overview of the LLaMA-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

### API route: `predict`

The predict route is the primary method for generating text completions based on a given instruction. It takes several parameters:

- __instruction__: The input text that you want the model to generate a response for.
- __temperature__ (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
- __top_p__ (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
- __top_k__ (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
- __num_beams__ (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.
- **instruction**: The input text that you want the model to generate a response for.
- **temperature** (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
- **top_p** (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
- **top_k** (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
- **num_beams** (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.

The API also supports passing any parameter supported by Huggingface's `Transformers.generate`.

Expand Down
17 changes: 8 additions & 9 deletions mistral/mistral-7b-instruct-chat-trt-llm-smooth-quant/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/mistral)

# Mistral-7B-Instruct-Chat Truss

This is a [Truss](https://truss.baseten.co/) for Mistral 7B Instruct. This README will walk you through how to deploy this Truss on Baseten to get your own instance of Mistral 7B Instruct.
Expand Down Expand Up @@ -35,7 +33,8 @@ Paste your Baseten API key if prompted.
For more information, see [Truss documentation](https://truss.baseten.co).

## Mistral 7B Instruct API documentation
This section provides an overview of the Mistral 7B Instruct API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

This section provides an overview of the Mistral 7B Instruct API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.

### API route: `predict`

Expand All @@ -46,11 +45,11 @@ This model is designed for our ChatCompletions endpoint:

We expect requests will the following information:

- ```messages``` (str): The prompt you'd like to complete
- ```max_tokens``` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
- ```beam_width``` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
- ```bad_words_list``` (list, default:[]): A list of words to not include in generated output.
- ```stop_words_list``` (list, default:[]): A list of words to stop generation upon encountering.
- ```repetition_penalty``` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.
- `messages` (str): The prompt you'd like to complete
- `max_tokens` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
- `beam_width` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
- `bad_words_list` (list, default:[]): A list of words to not include in generated output.
- `stop_words_list` (list, default:[]): A list of words to stop generation upon encountering.
- `repetition_penalty` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.

This Truss will stream responses back. Responses will be buffered chunks of text.
Loading

0 comments on commit 5663952

Please sign in to comment.