Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding backticks for code font, extra link to match edits in Markdown… #458

Merged
merged 1 commit into from
Jun 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions site/en/gemini-api/docs/vision.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -179,15 +179,15 @@
"\n",
"Images must be in one of the following image data [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
"\n",
"- PNG - image/png\n",
"- JPEG - image/jpeg\n",
"- WEBP - image/webp\n",
"- HEIC - image/heic\n",
"- HEIF - image/heif\n",
"- PNG - `image/png`\n",
"- JPEG - `image/jpeg`\n",
"- WEBP - `image/webp`\n",
"- HEIC - `image/heic`\n",
"- HEIF - `image/heif`\n",
"\n",
"Each image is equivalent to 258 tokens.\n",
"\n",
"While there are no specific limits to the number of pixels in an image besides the model’s context window, larger images are scaled down to a maximum resolution of 3072 x 3072 while preserving their original aspect ratio, while smaller images are scaled up to 768 x 768 pixels. There is no cost reduction for images at lower sizes, other than bandwidth, or performance improvement for images at higher resolution.\n",
"While there are no specific limits to the number of pixels in an image besides the model’s context window, larger images are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller images are scaled up to 768x768 pixels. There is no cost reduction for images at lower sizes, other than bandwidth, or performance improvement for images at higher resolution.\n",
"\n",
"For best results:\n",
"\n",
Expand All @@ -204,11 +204,11 @@
"source": [
"### Upload an image file using the File API\n",
"\n",
"Use the File API to upload an image of any size. (Always use the File API when the combination of files and system instructions that you intend to send is larger than 20MB.)\n",
"Use the File API to upload an image of any size. (Always use the File API when the combination of files and system instructions that you intend to send is larger than 20 MB.)\n",
"\n",
"**NOTE**: The File API lets you store up to 20GB of files per project, with a per-file maximum size of 2GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. It is available at no cost in all regions where the Gemini API is available.\n",
"**NOTE**: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. It is available at no cost in all regions where the Gemini API is available.\n",
"\n",
"Start by calling this [sketch of a jetpack](https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg)."
"Start by downloading this [sketch of a jetpack](https://storage.googleapis.com/generativeai-downloads/images/jetpack.jpg)."
]
},
{
Expand Down Expand Up @@ -265,7 +265,7 @@
"source": [
"### Verify image file upload and get metadata\n",
"\n",
"You can verify the API successfully stored the uploaded file and get its metadata by calling [files.get](https://ai.google.dev/api/rest/v1beta/files/get) through the SDK. Only the `name` (and by extension, the `uri`) are unique. Use `display_name` to identify files only if you manage uniqueness yourself."
"You can verify the API successfully stored the uploaded file and get its metadata by calling [`files.get`](https://ai.google.dev/api/rest/v1beta/files/get) through the SDK. Only the `name` (and by extension, the `uri`) are unique. Use `display_name` to identify files only if you manage uniqueness yourself."
]
},
{
Expand Down Expand Up @@ -331,7 +331,7 @@
"\n",
"<img width=400 src=\"https://ai.google.dev/tutorials/images/colab_upload.png\">\n",
"\n",
"When the combination of files and system instructions that you intend to send is larger than 20MB in size, use the File API to upload those files, as previously shown. Smaller files can instead be called locally from the Gemini API:\n"
"When the combination of files and system instructions that you intend to send is larger than 20 MB in size, use the File API to upload those files, as previously shown. Smaller files can instead be called locally from the Gemini API:\n"
]
},
{
Expand Down Expand Up @@ -436,19 +436,19 @@
"Gemini 1.5 Pro and Flash support up to approximately an hour of video data.\n",
"\n",
"Video must be in one of the following video format [MIME types](https://developers.google.com/drive/api/guides/ref-export-formats):\n",
" - video/mp4\n",
" - video/mpeg\n",
" - video/mov\n",
" - video/avi\n",
" - video/x-flv\n",
" - video/mpg\n",
" - video/webm\n",
" - video/wmv\n",
" - video/3gpp\n",
" - `video/mp4`\n",
" - `video/mpeg`\n",
" - `video/mov`\n",
" - `video/avi`\n",
" - `video/x-flv`\n",
" - `video/mpg`\n",
" - `video/webm`\n",
" - `video/wmv`\n",
" - `video/3gpp`\n",
"\n",
"The File API service currently extracts image frames from videos at 1 frame per second (FPS) and audio at 1Kbps, single channel, adding timestamps every second. These rates are subject to change in the future for improvements in inference.\n",
"\n",
"**NOTE:** The finer details of fast action sequences may be lost at the 1FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality.\n",
"**NOTE:** The finer details of fast action sequences may be lost at the 1 FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality.\n",
"\n",
"Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video.\n",
"\n",
Expand All @@ -468,7 +468,7 @@
"source": [
"### Upload a video file to the File API\n",
"\n",
"**NOTE**: The File API lets you store up to 20GB of files per project, with a per-file maximum size of 2GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but they cannot be downloaded using any API. It is available at no cost in all regions where the Gemini API is available.\n",
"**NOTE**: The File API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but they cannot be downloaded using any API. It is available at no cost in all regions where the Gemini API is available.\n",
"\n",
"The File API accepts video file formats directly. This example uses the short NASA film [\"Jupiter's Great Red Spot Shrinks and Grows\"](https://www.youtube.com/watch?v=JDi4IdtvDVE0). Credit: Goddard Space Flight Center (GSFC)/David Ladd (2018).\n",
"\n",
Expand Down Expand Up @@ -520,7 +520,7 @@
"source": [
"### Verify file upload and check state\n",
"\n",
"Verify the API has successfully received the files by calling the `files.get` method.\n",
"Verify the API has successfully received the files by calling the [`files.get`](https://ai.google.dev/api/rest/v1beta/files/get) method.\n",
"\n",
"**NOTE**: Video files have a `State` field in the File API. When a video is uploaded, it will be in the `PROCESSING` state until it is ready for inference. Only `ACTIVE` files can be used for model inference."
]
Expand Down
Loading