Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442

tadamcz · 2025-03-06T20:04:32Z

This PR contains:

What is the current behavior? (You can also link to an open issue here)

When FinishReason.MALFORMED_FUNCTION_CALL, Inspect raises inside completion_choice_from_candidate because content.parts is None

inspect_ai/src/inspect_ai/model/_providers/google.py

Line 560 in db5f359

    
           def completion_choice_from_candidate(candidate: Candidate) -> ChatCompletionChoice:

What is the new behavior?

Properly handle the case FinishReason.MALFORMED_FUNCTION_CALL. I decided to map it to StopReason "unknown", but I am also open to adding a new option to StopReason if that is preferred.

Added a test

tadamcz · 2025-03-06T20:12:58Z

Looks like some mypy errors on unchanged files (in the mistral provider), that should not be possible 🤔

Could this be because we don't use lockfiles?

tadamcz · 2025-03-06T20:27:48Z

@jjallaire I'm not going to fix what seems to me like an unrelated CI bug, would you consider either (a) fix the CI behaviour or (b) merge despite the CI failures?

jjallaire · 2025-03-06T21:33:50Z

Some related discussion: googleapis/python-aiplatform#4472

jjallaire · 2025-03-06T21:36:03Z

@jjallaire I'm not going to fix what seems to me like an unrelated CI bug, would you consider either (a) fix the CI behaviour or (b) merge despite the CI failures?

Yeah that's mistral making a breaking change in their API. I'll take care of that in a separate PR.

jjallaire · 2025-03-06T21:36:55Z

I am partial to "unknown" just to not knee-jerk spray another stop reason, but let me take a closer look and think on it tomorrow.

tadamcz · 2025-03-06T21:47:47Z

Yeah that's mistral making a breaking change in their API

Makes sense, but with lockfiles this would not cause random breakages :)

jjallaire · 2025-03-07T10:17:40Z

Makes sense, but with lockfiles this would not cause random breakages :)

It's a feature that there is breakage --- otherwise we would never be alerted that they have made a breaking change which just throws the problem into users laps. The mindset "we never want anything external to break our CI" is fine for a production deployment but a package that wants to be a good citizen and flexible w/r/t dependency resolution in myriad settings needs to know about these ASAP.

jjallaire · 2025-03-07T11:07:00Z

To be consistent with the way this is handled for other providers, we would ideally have a stop reason of "tool_calls" and allow the invalid function call (or some simulation thereof) to propagate through. It is quite common for models to call tools that don't exist or provide incorrect JSON schema -- the default tool loop handles these cases by replying to the model letting them know that they've made an incorrect tool call (and they will very often successfully recover).

If we do stop reason "unknown" the tool loop will just end (note that for some agents including basic_agent() the model will be re-prompted to continue, but critically they won't get any feedback that their tool call was wrong, possibly leading them to just make the same mistake again.

All of that said I don't even know whether anything like what I am suggesting is possible. It really depends on what Google returns if anything along with this stop reason. In the case that the context is insufficient we may be stuck with "unknown".

tadamcz · 2025-03-07T12:47:48Z

Unfortunately I don’t think google returns anything. You can see the test I added where I put a candidate object exactly as it appears when I put a breakpoint and run with a Gemini model.

…

On Fri, 7 Mar 2025 at 03:07, jjallaire ***@***.***> wrote: To be consistent with the way this is handled for other providers, we would ideally have a stop reason of "tool_calls" and allow the invalid function call (or some simulation thereof) to propagate through. It is quite common for models to call tools that don't exist or provide incorrect JSON schema -- the default tool loop handles these cases by replying to the model letting them know that they've made an incorrect tool call (and they will very often successfully recover). If we do stop reason "unknown" the tool loop will just end (note that for some agents including basic_agent() the model will be re-prompted to continue, but critically they won't get any feedback that their tool call was wrong, possibly leading them to just make the same mistake again. All of that said I don't even know whether anything like what I am suggesting is possible. It really depends on what Google returns if anything along with this stop reason. In the case that the context is insufficient we may be stuck with "unknown". — Reply to this email directly, view it on GitHub <#1442 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKKLOQPOY35WKZ37NR4UU4D2TF4WTAVCNFSM6AAAAABYPSFWVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBWGE3TCOBRGA> . You are receiving this because you authored the thread.Message ID: ***@***.***> [image: jjallaire]*jjallaire* left a comment (UKGovernmentBEIS/inspect_ai#1442) <#1442 (comment)> To be consistent with the way this is handled for other providers, we would ideally have a stop reason of "tool_calls" and allow the invalid function call (or some simulation thereof) to propagate through. It is quite common for models to call tools that don't exist or provide incorrect JSON schema -- the default tool loop handles these cases by replying to the model letting them know that they've made an incorrect tool call (and they will very often successfully recover). If we do stop reason "unknown" the tool loop will just end (note that for some agents including basic_agent() the model will be re-prompted to continue, but critically they won't get any feedback that their tool call was wrong, possibly leading them to just make the same mistake again. All of that said I don't even know whether anything like what I am suggesting is possible. It really depends on what Google returns if anything along with this stop reason. In the case that the context is insufficient we may be stuck with "unknown". — Reply to this email directly, view it on GitHub <#1442 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKKLOQPOY35WKZ37NR4UU4D2TF4WTAVCNFSM6AAAAABYPSFWVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBWGE3TCOBRGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

tadamcz added 2 commits March 6, 2025 11:44

handle FinishReason.MALFORMED_FUNCTION_CALL

9e6dcf8

add test for FinishReason.MALFORMED_FUNCTION_CALL

6f62760

ruff

c962767

tadamcz force-pushed the google-handle-malformed-function-call branch from ee55fc2 to c962767 Compare March 6, 2025 20:14

Merge branch 'main' into google-handle-malformed-function-call

b4af555

jjallaire approved these changes Mar 7, 2025

View reviewed changes

Merge branch 'main' into google-handle-malformed-function-call

c1d63df

jjallaire approved these changes Mar 7, 2025

View reviewed changes

jjallaire merged commit 895282c into UKGovernmentBEIS:main Mar 7, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442

Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442

tadamcz commented Mar 6, 2025

tadamcz commented Mar 6, 2025 •

edited

Loading

tadamcz commented Mar 6, 2025

jjallaire commented Mar 6, 2025

jjallaire commented Mar 6, 2025

jjallaire commented Mar 6, 2025

tadamcz commented Mar 6, 2025

jjallaire commented Mar 7, 2025

jjallaire commented Mar 7, 2025

tadamcz commented Mar 7, 2025 via email

Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442

Google: handle FinishReason.MALFORMED_FUNCTION_CALL #1442

Conversation

tadamcz commented Mar 6, 2025

This PR contains:

What is the current behavior? (You can also link to an open issue here)

What is the new behavior?

tadamcz commented Mar 6, 2025 • edited Loading

tadamcz commented Mar 6, 2025

jjallaire commented Mar 6, 2025

jjallaire commented Mar 6, 2025

jjallaire commented Mar 6, 2025

tadamcz commented Mar 6, 2025

jjallaire commented Mar 7, 2025

jjallaire commented Mar 7, 2025

tadamcz commented Mar 7, 2025 via email

tadamcz commented Mar 6, 2025 •

edited

Loading