Applied all changes and commented fixes.

google · Jun 14, 2024 · 101acec · 101acec
1 parent c3dcea2
commit 101acec
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/site/en/gemini-api/docs/vision.ipynb b/site/en/gemini-api/docs/vision.ipynb
@@ -204,7 +204,7 @@
       "source": [
         "### Upload an image file using the File API\n",
         "\n",
-        "Use the File API to upload an image of any size. (Images greater than 20MB cannot be handled inline and must be uploaded using the File API.)\n",
+        "Use the File API to upload an image of any size. (Always use the File API when the combination of files and system instructions that you intend to send is larger than 20MB.)\n",
         "\n",
         "**NOTE**: The File API lets you store up to 20GB of files per project, with a per-file maximum size of 2GB. Files are stored for 48 hours. They can be accessed in that period with your API key, but cannot be downloaded from the API. It is available at no cost in all regions where the Gemini API is available.\n",
         "\n",
@@ -331,7 +331,7 @@
         "\n",
         "<img width=400 src=\"https://ai.google.dev/tutorials/images/colab_upload.png\">\n",
         "\n",
-        "When the combination of files and system instructions that you intend to send is larger than 20MB in size, use the File API to upload those files, as previously shown. Smaller files can instead be called locally from the Gemini API: Smaller files can be called locally from the Gemini API:\n"
+        "When the combination of files and system instructions that you intend to send is larger than 20MB in size, use the File API to upload those files, as previously shown. Smaller files can instead be called locally from the Gemini API:\n"
       ]
     },
     {
@@ -450,7 +450,7 @@
         "\n",
         "**NOTE:** The finer details of fast action sequences may be lost at the 1FPS frame sampling rate. Consider slowing down high-speed clips for improved inference quality.\n",
         "\n",
-        "Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit about 55.5 minutes of video.\n",
+        "Individual frames are 258 tokens, and audio is 32 tokens per second. With metadata, each second of video becomes ~300 tokens, which means a 1M context window can fit slightly less than an hour of video.\n",
         "\n",
         "To ask questions about time-stamped locations, use the format `MM:SS`, where the first two digits represent minutes and the last two digits represent seconds.\n",
         "\n",