-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442
Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442
Conversation
Looks like some Could this be because we don't use lockfiles? |
ee55fc2
to
c962767
Compare
@jjallaire I'm not going to fix what seems to me like an unrelated CI bug, would you consider either (a) fix the CI behaviour or (b) merge despite the CI failures? |
Some related discussion: googleapis/python-aiplatform#4472 |
Yeah that's mistral making a breaking change in their API. I'll take care of that in a separate PR. |
I am partial to "unknown" just to not knee-jerk spray another stop reason, but let me take a closer look and think on it tomorrow. |
Makes sense, but with lockfiles this would not cause random breakages :) |
It's a feature that there is breakage --- otherwise we would never be alerted that they have made a breaking change which just throws the problem into users laps. The mindset "we never want anything external to break our CI" is fine for a production deployment but a package that wants to be a good citizen and flexible w/r/t dependency resolution in myriad settings needs to know about these ASAP. |
To be consistent with the way this is handled for other providers, we would ideally have a stop reason of "tool_calls" and allow the invalid function call (or some simulation thereof) to propagate through. It is quite common for models to call tools that don't exist or provide incorrect JSON schema -- the default tool loop handles these cases by replying to the model letting them know that they've made an incorrect tool call (and they will very often successfully recover). If we do stop reason "unknown" the tool loop will just end (note that for some agents including All of that said I don't even know whether anything like what I am suggesting is possible. It really depends on what Google returns if anything along with this stop reason. In the case that the context is insufficient we may be stuck with "unknown". |
Unfortunately I don’t think google returns anything. You can see the test I
added where I put a candidate object exactly as it appears when I put a
breakpoint and run with a Gemini model.
…On Fri, 7 Mar 2025 at 03:07, jjallaire ***@***.***> wrote:
To be consistent with the way this is handled for other providers, we
would ideally have a stop reason of "tool_calls" and allow the invalid
function call (or some simulation thereof) to propagate through. It is
quite common for models to call tools that don't exist or provide incorrect
JSON schema -- the default tool loop handles these cases by replying to the
model letting them know that they've made an incorrect tool call (and they
will very often successfully recover).
If we do stop reason "unknown" the tool loop will just end (note that for
some agents including basic_agent() the model will be re-prompted to
continue, but critically they won't get any feedback that their tool call
was wrong, possibly leading them to just make the same mistake again.
All of that said I don't even know whether anything like what I am
suggesting is possible. It really depends on what Google returns if
anything along with this stop reason. In the case that the context is
insufficient we may be stuck with "unknown".
—
Reply to this email directly, view it on GitHub
<#1442 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKKLOQPOY35WKZ37NR4UU4D2TF4WTAVCNFSM6AAAAABYPSFWVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBWGE3TCOBRGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
[image: jjallaire]*jjallaire* left a comment
(UKGovernmentBEIS/inspect_ai#1442)
<#1442 (comment)>
To be consistent with the way this is handled for other providers, we
would ideally have a stop reason of "tool_calls" and allow the invalid
function call (or some simulation thereof) to propagate through. It is
quite common for models to call tools that don't exist or provide incorrect
JSON schema -- the default tool loop handles these cases by replying to the
model letting them know that they've made an incorrect tool call (and they
will very often successfully recover).
If we do stop reason "unknown" the tool loop will just end (note that for
some agents including basic_agent() the model will be re-prompted to
continue, but critically they won't get any feedback that their tool call
was wrong, possibly leading them to just make the same mistake again.
All of that said I don't even know whether anything like what I am
suggesting is possible. It really depends on what Google returns if
anything along with this stop reason. In the case that the context is
insufficient we may be stuck with "unknown".
—
Reply to this email directly, view it on GitHub
<#1442 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKKLOQPOY35WKZ37NR4UU4D2TF4WTAVCNFSM6AAAAABYPSFWVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBWGE3TCOBRGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
This PR contains:
What is the current behavior? (You can also link to an open issue here)
When FinishReason.MALFORMED_FUNCTION_CALL, Inspect raises inside
completion_choice_from_candidate
becausecontent.parts
is Noneinspect_ai/src/inspect_ai/model/_providers/google.py
Line 560 in db5f359
What is the new behavior?
Properly handle the case FinishReason.MALFORMED_FUNCTION_CALL. I decided to map it to
StopReason
"unknown", but I am also open to adding a new option toStopReason
if that is preferred.Added a test