Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt caching does not work for models/gemini-1.5-pro-002 #661

Open
nharada1 opened this issue Jan 2, 2025 · 10 comments
Open

Prompt caching does not work for models/gemini-1.5-pro-002 #661

nharada1 opened this issue Jan 2, 2025 · 10 comments
Assignees
Labels
component:python sdk Issue/PR related to Python SDK status:triaged Issue/PR triaged to the corresponding sub-team type:help Support-related issues

Comments

@nharada1
Copy link

nharada1 commented Jan 2, 2025

Description of the bug:

Using 'models/gemini-1.5-pro-002' is a model for prompt caching fails when creating the cache. To reproduce, run the example code at https://ai.google.dev/gemini-api/docs/caching?lang=python but replace the model name with pro-002

Actual vs expected behavior:

Expected: Caching writes and reads successfully

Actual: On write, I get the error "BadRequest: 400 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-002:generateContent?%24alt=json%3Benum-encoding%3Dint: Request contains an invalid argument."

Any other information you'd like to share?

The context caching doc claims: "Note: Context caching is only available for stable models with fixed versions (for example, gemini-1.5-pro-001). You must include the version postfix (for example, the -001 in gemini-1.5-pro-001)."

So I'd expect that pro-002 is a stable version and should work.

@manojssmk manojssmk added status:triaged Issue/PR triaged to the corresponding sub-team component:python sdk Issue/PR related to Python SDK labels Jan 2, 2025
@vishal-dharm
Copy link
Collaborator

Hey @nharada1, could you try using the Caching notebook? I tested with this notebook after updating the model to 1.5 Pro 002 and it worked as expected.

I'll also review the documentation example and update it with a fix.

@nharada1
Copy link
Author

nharada1 commented Jan 4, 2025

The caching notebook works fine for me with the apollo text, but it fails with a video. So maybe this issue is specifically a video issue? I tried the caching notebook with the sherlock video and it fails with the same error:

WARNING:tornado.access:400 POST /v1beta/models/gemini-1.5-pro-002:generateContent?%24alt=json%3Benum-encoding%3Dint (127.0.0.1) 3463.38ms
---------------------------------------------------------------------------
BadRequest                                Traceback (most recent call last)
[<ipython-input-38-befb1830dc79>](https://localhost:8080/#) in <cell line: 3>()
      1 apollo_model = genai.GenerativeModel.from_cached_content(cached_content=apollo_cache)
      2 
----> 3 response = apollo_model.generate_content("Summarize this movie please.")
      4 print(response)

9 frames
[/usr/local/lib/python3.10/dist-packages/google/ai/generativelanguage_v1beta/services/generative_service/transports/rest.py](https://localhost:8080/#) in __call__(self, request, retry, timeout, metadata)
    845             # subclass.
    846             if response.status_code >= 400:
--> 847                 raise core_exceptions.from_http_response(response)
    848 
    849             # Return the response

BadRequest: 400 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-002:generateContent?%24alt=json%3Benum-encoding%3Dint: Request contains an invalid argument.

@Gunand3043
Copy link

Hey @nharada1, I tried context caching with the video, and it worked fine. Please take a look.

Video_Caching

Thanks.

@Gunand3043 Gunand3043 added status:awaiting user response Awaiting a response from the author type:help Support-related issues and removed status:triaged Issue/PR triaged to the corresponding sub-team labels Jan 6, 2025
@nharada1
Copy link
Author

nharada1 commented Jan 6, 2025 via email

@Giom-V
Copy link
Contributor

Giom-V commented Jan 7, 2025

@nharada1 My guess is that there's something in your video that triggers the safety settings.

@nharada1
Copy link
Author

nharada1 commented Jan 8, 2025

Is that the error that's occurring when it gives me the 400? I don't see anything about safety settings triggering, just a 400 POST error. When I do the same query without caching I don't see an issue either.

Also that video is from the published Google documentation, and works fine with both the other models.

Here is my reproduction notebook:

https://colab.research.google.com/drive/1FtSL3BmiwomOCAekDReo5D4pnw8gMdGs?usp=sharing

@Giom-V
Copy link
Contributor

Giom-V commented Jan 9, 2025

@nharada1 it seems we have an issue with the 1.5 model and the file API. Let me check if that could be related and I'll come back to you.

@Gunand3043 Gunand3043 added status:triaged Issue/PR triaged to the corresponding sub-team and removed status:awaiting user response Awaiting a response from the author labels Jan 10, 2025
@Giom-V
Copy link
Contributor

Giom-V commented Jan 13, 2025

The issue I was checking has been fixed but it did not solve yours. I'm still trying to figure out what the problem is.

@nharada1
Copy link
Author

nharada1 commented Jan 13, 2025 via email

@MarkDaoust
Copy link
Collaborator

Could the problem here just be that flash has a free tier for caching, but pro doesn't? I would have expected some kind of quota error..

https://ai.google.dev/pricing#1_5flash
https://ai.google.dev/pricing#1_5pro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:python sdk Issue/PR related to Python SDK status:triaged Issue/PR triaged to the corresponding sub-team type:help Support-related issues
Projects
None yet
Development

No branches or pull requests

6 participants