Remove all deploy on baseten buttons

basetenlabs · Jan 10, 2024 · 5663952 · 5663952
1 parent 9b2ccdc
commit 5663952
Show file tree

Hide file tree

Showing 15 changed files with 154 additions and 159 deletions.
diff --git a/alpaca-7b/README.md b/alpaca-7b/README.md
@@ -1,8 +1,7 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/alpaca)
-
 # Alpaca-7B Truss
 
 This is a [Truss](https://truss.baseten.co/) for Alpaca-7B, a fine-tuned variant of LLaMA-7B. LLaMA is a family of language models released by Meta. This README will walk you through how to deploy this Truss on Baseten to get your own instance of Alpaca-7B.
+
 ## Deploy Alpaca-7B
 
 First, clone this repository:
@@ -28,16 +27,18 @@ Paste your Baseten API key if prompted.
 For more information, see [Truss documentation](https://truss.baseten.co).
 
 ## Alpaca-7B API documentation
-This section provides an overview of the Alpaca-7B API, its parameters, and how to use it. The API consists of a single route named  `predict`, which you can invoke to generate text based on the provided instruction.
+
+This section provides an overview of the Alpaca-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.
 
 ### API route: `predict`
+
 The predict route is the primary method for generating text completions based on a given instruction. It takes several parameters:
 
-- __instruction__: The input text that you want the model to generate a response for.
-- __temperature__ (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
-- __top_p__ (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
-- __top_k__ (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
-- __num_beams__ (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.
+- **instruction**: The input text that you want the model to generate a response for.
+- **temperature** (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
+- **top_p** (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
+- **top_k** (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
+- **num_beams** (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.
 
 The API also supports passing any parameter supported by Huggingface's `Transformers.generate`.
 

diff --git a/deepfloyd-xl/README.md b/deepfloyd-xl/README.md
@@ -1,5 +1,3 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/deepfloyd)
-
 # DeepFloyd XL Truss
 
 This is a [Truss](https://truss.baseten.co/) for DeepFloyd-IF. DeepFloyd-IF is a pixel-based text-to-image triple-cascaded diffusion model that can generate pictures and sets a new state-of-the-art for photorealism and language understanding. The result is a highly efficient model that outperforms current state-of-the-art models, achieving a zero-shot FID-30K score of 6.66 on the COCO dataset.
@@ -50,20 +48,24 @@ For more information, see [Truss documentation](https://truss.baseten.co).
 
 This deployment of DeepFloyd takes a dictionary as input, which requires the following key:
 
-* `prompt` - the prompt for image generation
+- `prompt` - the prompt for image generation
 
 It also supports a number of other parameters detailed in [this blog post](https://huggingface.co/blog/if).
 
 ### Output
 
 The result will be a dictionary containing:
 
-* `status` - either `success` or `failed`
-* `data` - list of base 64 encoded images
-* `message` - will contain details in the case of errors
+- `status` - either `success` or `failed`
+- `data` - list of base 64 encoded images
+- `message` - will contain details in the case of errors
 
 ```json
-{"status": "success", "data": ["/9j/4AAQSkZJRgABAQAAAQABAA...."], "message": null}
+{
+  "status": "success",
+  "data": ["/9j/4AAQSkZJRgABAQAAAQABAA...."],
+  "message": null
+}
 ```
 
 ## Example usage

diff --git a/flan-t5-xl/README.md b/flan-t5-xl/README.md
@@ -1,16 +1,14 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/flan_t5)
-
 # FLAN-T5 XL Truss
 
 [Flan-T5 XL](https://huggingface.co/google/flan-t5-xl?text=Q%3A+%28+False+or+not+False+or+False+%29+is%3F+A%3A+Let%27s+think+step+by+step) is an open-source large language model developed by Google.
 
 Flan-T5 XL has a number of use cases such as:
 
-* Sentiment analysis
-* Paraphrasing/sentence similarity
-* Natural language inference
-* Sentence completion
-* Question answering
+- Sentiment analysis
+- Paraphrasing/sentence similarity
+- Natural language inference
+- Sentence completion
+- Question answering
 
 Flan-T5 XL is similar to T5 except it is "instruction tuned". In practice, this means that the model is comparable to GPT-3 in multitask benchmarks because it is fine-tuned to follow human inputs / instructions.
 
@@ -37,14 +35,15 @@ truss push
 Paste your Baseten API key if prompted.
 
 For more information, see [Truss documentation](https://truss.baseten.co).
+
 ## FLAN-T5 XL API documentation
 
 ### Input
 
 The input should be a list of dictionaries and may contain the following key:
 
-* `prompt` - the prompt for text generation
-* `bad_words` - an optional list of strings to avoid in the generated output
+- `prompt` - the prompt for text generation
+- `bad_words` - an optional list of strings to avoid in the generated output
 
 The [official documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate) has information on additional parameters.
 
@@ -59,9 +58,9 @@ The [official documentation](https://huggingface.co/docs/transformers/main/en/ma
 
 The result will be a dictionary containing:
 
-* `status` - either `success` or `failed`
-* `data` - the output text
-* `message` - will contain details in the case of errors
+- `status` - either `success` or `failed`
+- `data` - the output text
+- `message` - will contain details in the case of errors
 
 ```
 {

diff --git a/gfp-gan/README.md b/gfp-gan/README.md
@@ -1,5 +1,3 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/gfp_gan)
-
 # GFP-GAN Truss
 
 This is a [Truss](https://truss.baseten.co/) for serving an implementation of TencentARC
@@ -38,15 +36,16 @@ For more information, see [Truss documentation](https://truss.baseten.co).
 ### Input
 
 The input should be a dictionary with the following key:
-* `image` - the image to be restored, encoded as base64.
+
+- `image` - the image to be restored, encoded as base64.
 
 ### Output
 
 The model returns a dictionary containing the base64-encoded restored image:
-* `status` - either `success` or `failed`
-* `data` - the restored image, encoded as base64
-* `message` - will contain details in the case of errors
 
+- `status` - either `success` or `failed`
+- `data` - the restored image, encoded as base64
+- `message` - will contain details in the case of errors
 
 ## Example usage
 

diff --git a/gpt-j/README.md b/gpt-j/README.md
@@ -1,5 +1,3 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/gpt_j)
-
 # GPT-J Truss
 
 This is an implementation of EleutherAI
@@ -38,46 +36,50 @@ For more information, see [Truss documentation](https://truss.baseten.co).
 
 The input should be a list of dictionaries and must contain the following key:
 
-* `prompt` - the prompt for text generation
+- `prompt` - the prompt for text generation
 
 Additionally; the following optional parameters are supported as pass thru to the `generate` method. For more details, see the [official documentation](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.generation_utils.GenerationMixin.generate)
 
-* `max_length` - int - limited to  512
-* `min_length` - int - limited to 64
-* `do_sample` - bool
-* `early_stopping` - bool
-* `num_beams` - int
-* `temperature`  - float
-* `top_k` - int
-* `top_p` - float
-* `repetition_penalty` - float
-* `length_penalty` - float
-* `encoder_no_repeat_ngram_size` - int
-* `num_return_sequences` - int
-* `max_time` - float
-* `num_beam_groups` - int
-* `diversity_penalty` - float
-* `remove_invalid_values` - bool
+- `max_length` - int - limited to 512
+- `min_length` - int - limited to 64
+- `do_sample` - bool
+- `early_stopping` - bool
+- `num_beams` - int
+- `temperature` - float
+- `top_k` - int
+- `top_p` - float
+- `repetition_penalty` - float
+- `length_penalty` - float
+- `encoder_no_repeat_ngram_size` - int
+- `num_return_sequences` - int
+- `max_time` - float
+- `num_beam_groups` - int
+- `diversity_penalty` - float
+- `remove_invalid_values` - bool
 
 Here's an example input:
 
 ```json
 {
-    "prompt": "If I was a billionaire, I would",
-    "max_length": 50
+  "prompt": "If I was a billionaire, I would",
+  "max_length": 50
 }
 ```
 
 ### Output
 
 The result will be a dictionary containing:
 
-* `status` - either `success` or `failed`
-* `data` - the output text
-* `message` - will contain details in the case of errors
+- `status` - either `success` or `failed`
+- `data` - the output text
+- `message` - will contain details in the case of errors
 
 ```json
-{"status": "success", "data": "If I was a billionaire, I would buy a plane.", "message": null}
+{
+  "status": "success",
+  "data": "If I was a billionaire, I would buy a plane.",
+  "message": null
+}
 ```
 
 ## Example usage

diff --git a/llama/llama-2-7b-trt-llm/README.md b/llama/llama-2-7b-trt-llm/README.md
@@ -1,5 +1,3 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/llama)
-
 # LLaMA2-7B-Chat Truss
 
 This is a [Truss](https://truss.baseten.co/) for an int8 SmoothQuant version of LLaMA2-7B-Chat. Llama is a family of language models released by Meta. This README will walk you through how to deploy this Truss on Baseten to get your own instance of LLaMA2-7B-Chat.
@@ -35,19 +33,19 @@ Paste your Baseten API key if prompted.
 For more information, see [Truss documentation](https://truss.baseten.co).
 
 ## LLaMA2-7B API documentation
-This section provides an overview of the LLaMA2-7B API, its parameters, and how to use it. The API consists of a single route named  `predict`, which you can invoke to generate text based on the provided instruction.
+
+This section provides an overview of the LLaMA2-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.
 
 ### API route: `predict`
 
 We expect requests will the following information:
 
-
-- ```prompt``` (str): The prompt you'd like to complete
-- ```max_tokens``` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
-- ```beam_width``` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
-- ```bad_words_list``` (list, default:[]): A list of words to not include in generated output.
-- ```stop_words_list``` (list, default:[]): A list of words to stop generation upon encountering.
-- ```repetition_penalty``` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.
+- `prompt` (str): The prompt you'd like to complete
+- `max_tokens` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
+- `beam_width` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
+- `bad_words_list` (list, default:[]): A list of words to not include in generated output.
+- `stop_words_list` (list, default:[]): A list of words to stop generation upon encountering.
+- `repetition_penalty` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.
 
 This Truss will stream responses back. Responses will be buffered chunks of text.
 

diff --git a/llama/llama-7b/README.md b/llama/llama-7b/README.md
@@ -1,5 +1,3 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/llama)
-
 # LLaMA-7B Truss
 
 This is a [Truss](https://truss.baseten.co/) for an int8 version of LLaMA-7B. Llama is a family of language models released by Meta. This README will walk you through how to deploy this Truss on Baseten to get your own instance of LLaMA-7B.
@@ -33,16 +31,18 @@ Paste your Baseten API key if prompted.
 For more information, see [Truss documentation](https://truss.baseten.co).
 
 ## LLaMA-7B API documentation
-This section provides an overview of the LLaMA-7B API, its parameters, and how to use it. The API consists of a single route named  `predict`, which you can invoke to generate text based on the provided instruction.
+
+This section provides an overview of the LLaMA-7B API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.
 
 ### API route: `predict`
+
 The predict route is the primary method for generating text completions based on a given instruction. It takes several parameters:
 
-- __instruction__: The input text that you want the model to generate a response for.
-- __temperature__ (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
-- __top_p__ (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
-- __top_k__ (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
-- __num_beams__ (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.
+- **instruction**: The input text that you want the model to generate a response for.
+- **temperature** (optional, default=0.1): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
+- **top_p** (optional, default=0.75): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
+- **top_k** (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.
+- **num_beams** (optional, default=4): The number of beams used for beam search. Increasing this value can result in higher-quality output but will increase the computational cost.
 
 The API also supports passing any parameter supported by Huggingface's `Transformers.generate`.
 

diff --git a/mistral/mistral-7b-instruct-chat-trt-llm-smooth-quant/README.md b/mistral/mistral-7b-instruct-chat-trt-llm-smooth-quant/README.md
@@ -1,5 +1,3 @@
-[![Deploy to Baseten](https://user-images.githubusercontent.com/2389286/236301770-16f46d4f-4e23-4db5-9462-f578ec31e751.svg)](https://app.baseten.co/explore/mistral)
-
 # Mistral-7B-Instruct-Chat Truss
 
 This is a [Truss](https://truss.baseten.co/) for Mistral 7B Instruct. This README will walk you through how to deploy this Truss on Baseten to get your own instance of Mistral 7B Instruct.
@@ -35,7 +33,8 @@ Paste your Baseten API key if prompted.
 For more information, see [Truss documentation](https://truss.baseten.co).
 
 ## Mistral 7B Instruct API documentation
-This section provides an overview of the Mistral 7B Instruct API, its parameters, and how to use it. The API consists of a single route named  `predict`, which you can invoke to generate text based on the provided instruction.
+
+This section provides an overview of the Mistral 7B Instruct API, its parameters, and how to use it. The API consists of a single route named `predict`, which you can invoke to generate text based on the provided instruction.
 
 ### API route: `predict`
 
@@ -46,11 +45,11 @@ This model is designed for our ChatCompletions endpoint:
 
 We expect requests will the following information:
 
-- ```messages``` (str): The prompt you'd like to complete
-- ```max_tokens``` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
-- ```beam_width``` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
-- ```bad_words_list``` (list, default:[]): A list of words to not include in generated output.
-- ```stop_words_list``` (list, default:[]): A list of words to stop generation upon encountering.
-- ```repetition_penalty``` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.
+- `messages` (str): The prompt you'd like to complete
+- `max_tokens` (int, default: 50): The max token count. This includes the number of tokens in your prompt so if this value is less than your prompt, you'll just recieve a truncated version of the prompt.
+- `beam_width` (int, default:50): The number of beams to compute. This must be 1 for this version of TRT-LLM. Inflight-batching does not support beams > 1.
+- `bad_words_list` (list, default:[]): A list of words to not include in generated output.
+- `stop_words_list` (list, default:[]): A list of words to stop generation upon encountering.
+- `repetition_penalty` (float, defualt: 1.0): A repetition penalty to incentivize not repeating tokens.
 
 This Truss will stream responses back. Responses will be buffered chunks of text.