Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET #9933

Closed
GregorBiswanger opened this issue Dec 10, 2024 · 8 comments · Fixed by #9954
Assignees
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code

Comments

@GregorBiswanger
Copy link

Describe the bug
Semantic Kernel Functions in .NET using Microsoft.SemanticKernel.Connectors.AzureAIInference do not work with Azure-hosted open-source models like "Mistral Nemo," even though these models support Function Calls. The same setup works fine with GPT models, and in Python, it works with Azure AI without any issues. This seems to be a .NET-specific issue with Semantic Kernel.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy an open-source model (e.g., "Mistral Nemo") in Azure AI.
  2. Integrate the connector Microsoft.SemanticKernel.Connectors.AzureAIInference.
  3. Use PromptExecutionSettings with FunctionChoiceBehavior.Auto().
  4. Implement and execute a Semantic Kernel Function.

Expected behavior
The Semantic Kernel Function should execute successfully using Function Calls with the Azure-hosted open-source model, similar to how it works with GPT models or with Python implementations.

Screenshots
Not applicable

Platform

  • OS: Windows
  • IDE: Visual Studio 2022
  • Language: C#
  • Source:
    • Semantic Kernel version: 1.32.0
    • Microsoft.SemanticKernel.Connectors.AzureAIInference version: 1.32.0-beta

Additional context

  • The same functions work locally with models like LM Studio (Beta) or Ollama.
  • In Python, Function Calls with Azure-hosted models work as expected.
  • This issue seems specific to the Semantic Kernel in .NET when using Azure AI.

Please provide guidance or a fix to enable Semantic Kernel Functions to work with Azure-hosted open-source models.

@GregorBiswanger GregorBiswanger added the bug Something isn't working label Dec 10, 2024
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code python Pull requests for the Python Semantic Kernel triage labels Dec 10, 2024
@github-actions github-actions bot changed the title Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET .Net: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET Dec 10, 2024
@github-actions github-actions bot changed the title .Net: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET Python: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET Dec 10, 2024
@GregorBiswanger GregorBiswanger changed the title Python: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET Dotnet: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET Dec 10, 2024
@GregorBiswanger GregorBiswanger changed the title Dotnet: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET .Net: Bug: Azure-hosted Open-Source Models (e.g., Mistral Nemo) fail with Kernel Functions in .NET Dec 10, 2024
@sphenry sphenry removed the triage label Dec 10, 2024
@sphenry
Copy link
Member

sphenry commented Dec 10, 2024

@markwallace-microsoft can you take a look?

@markwallace-microsoft markwallace-microsoft moved this to Sprint: In Progress in Semantic Kernel Dec 10, 2024
@moonbox3 moonbox3 removed the python Pull requests for the Python Semantic Kernel label Dec 11, 2024
@markwallace-microsoft
Copy link
Member

@GregorBiswanger I created a sample for this and tested with Mistral Nemo and didn't see any issues. Take a look here: #9954

@markwallace-microsoft markwallace-microsoft moved this from Sprint: In Progress to Sprint: In Review in Semantic Kernel Dec 12, 2024
@GregorBiswanger
Copy link
Author

@markwallace-microsoft It is important that you tried Mistral Nemo via Azure AI and not with Ollama or locally... so really via Azure?!

@GregorBiswanger
Copy link
Author

Hi @markwallace-microsoft,

After further investigation, I’ve identified that the issue specifically occurs when streaming is enabled for the Azure-hosted Open Source Models (e.g., Mistral Nemo) using Semantic Kernel in .NET. Here’s what I tested:

What I tested:

  1. Validation of function parameters:

    • I ensured all functions were configured with their required parameters correctly. There were no missing or misconfigured parameters.
  2. Simplified system prompts:

    • I reduced the complexity of the system prompt, removing Markdown and special encodings to rule out parsing issues. This worked fine but didn’t resolve the issue when streaming was enabled.
  3. Disabling Streaming:

    • When I disabled streaming, the function calls worked perfectly. This pinpointed the issue to be specific to the streaming functionality.

Code that does not work with Streaming:

#pragma warning disable SKEXP0070

using Microsoft.Extensions.DependencyInjection;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.AzureAIInference;
using OllamaApiFacade.Extensions;
using SemanticFlow.DemoWebApi;

var azureKeyVaultHelper = new AzureKeyVaultHelper("https://XXX.vault.azure.net");
var endpoint = await azureKeyVaultHelper.GetSecretAsync("AZURE-MINISTRAL-NEMO-ENDPOINT");
var apiKey = await azureKeyVaultHelper.GetSecretAsync("AZURE-MINISTRAL-NEMO-KEY");

// Kernel setup
var builder = Kernel.CreateBuilder()
    .AddAzureAIInferenceChatCompletion("Mistral-Nemo-oclsi", apiKey, new Uri(endpoint));

// Use Burp Suite proxy for analysis backend communication
// builder.Services.AddProxyForDebug();

var kernel = builder.Build();

// Import plugin functions
kernel.ImportPluginFromFunctions("HelperFunctions",
[
    kernel.CreateFunctionFromMethod(() => new List<string> { "Squirrel Steals Show", "Dog Wins Lottery" },
        "GetLatestNewsTitles", "Retrieves latest news titles."),
    kernel.CreateFunctionFromMethod(() => DateTime.UtcNow.ToString("R"), 
        "GetCurrentUtcDateTime", "Retrieves the current date time in UTC."),
    kernel.CreateFunctionFromMethod((string cityName, string currentDateTime) =>
    {
        if (string.IsNullOrEmpty(cityName) || string.IsNullOrEmpty(currentDateTime))
        {
            throw new ArgumentException("cityName and currentDateTime are required.");
        }
        return cityName switch
        {
            "Boston" => "61 and rainy",
            "London" => "55 and cloudy",
            "Miami" => "80 and sunny",
            "Paris" => "60 and rainy",
            "Tokyo" => "50 and sunny",
            "Sydney" => "75 and sunny",
            "Tel Aviv" => "80 and sunny",
            _ => "31 and snowing",
        };
    }, "GetWeatherForCity", "Gets the current weather for the specified city, using the city name and current UTC date/time.")
]);

// Settings with Streaming Enabled
var settings = new AzureAIInferencePromptExecutionSettings { FunctionChoiceBehavior = FunctionChoiceBehavior.Auto() };

var chatHistory = new ChatHistory();
chatHistory.AddUserMessage("What is the weather in Tokyo based on the current date and time?");

// Streaming with IChatCompletionService
var chatCompletionService = kernel.Services.GetRequiredService<IChatCompletionService>();

try
{
    // Stream the response
    await foreach (var message in chatCompletionService.GetStreamingChatMessageContentsAsync(chatHistory, settings, kernel))
    {
        if (message.Role.HasValue)
        {
            Console.Write($"{message.Role.Value}: ");
        }
        if (!string.IsNullOrEmpty(message.Content))
        {
            Console.Write(message.Content);
        }
    }
    Console.WriteLine("\nStreaming completed.");
}
catch (Exception ex)
{
    // Log any exceptions
    Console.WriteLine($"Error: {ex.Message}");
}

Cheers,
Gregor

@GregorBiswanger
Copy link
Author

Hi @markwallace-microsoft,

I’ve discovered something new that might narrow down the issue further. It appears the problem isn’t just related to streaming but to Chat Completions in general.

Using Burp Suite, I analyzed the request JSON being sent. I noticed that the JSON contains two separate tools entries, which may be causing unexpected behavior. From what I observed:

  • Azure AI seems to focus on the last tools entry in the JSON, potentially ignoring the others.
  • In contrast, local solutions like LM-Studio (Beta) handled the multiple tools entries without any issues.

This leads me to believe there’s a bug in how Chat Completions handle these duplicate entries in the request JSON.

Here’s the exact JSON being sent (identical even with streaming enabled):

POST /chat/completions?api-version=2024-05-01-preview HTTP/1.1
Host: mistral-nemo-oclsi.swedencentral.models.ai.azure.com
Accept: application/json
...
Content-Type: application/json
Content-Length: 936
Connection: keep-alive

{
    "messages": [
        {
            "content": [
                {
                    "text": "What is the weather in Tokyo based on the current date and time?",
                    "type": "text"
                }
            ],
            "role": "user"
        }
    ],
    "model": "Mistral-Nemo-oclsi",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "HelperFunctions-GetLatestNewsTitles",
                "description": "Retrieves latest news titles.",
                "parameters": {
                    "type": "object",
                    "required": [],
                    "properties": {}
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "HelperFunctions-GetCurrentUtcDateTime",
                "description": "Retrieves the current date time in UTC.",
                "parameters": {
                    "type": "object",
                    "required": [],
                    "properties": {}
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "HelperFunctions-GetWeatherForCity",
                "description": "Gets the current weather for the specified city, using the city name and current UTC date/time.",
                "parameters": {
                    "type": "object",
                    "required": ["cityName", "currentDateTime"],
                    "properties": {
                        "cityName": {
                            "type": "string"
                        },
                        "currentDateTime": {
                            "type": "string"
                        }
                    }
                }
            }
        }
    ],
    "tool_choice": "auto",
    "stop": [],
    "tools": []
}

Key Observation:
The JSON contains two tools arrays:

  • One under "tools" and another under "tools":[] at the end.

While this structure doesn’t seem to break local solutions like LM-Studio, Azure AI appears to mishandle it by focusing on the last entry.

I now believe the bug lies in how Chat Completions process these entries in general. Let me know if additional details or logs are needed!

@markwallace-microsoft
Copy link
Member

Thanks for the detailed investigation @GregorBiswanger. I suspect this is an issue in the Azure SDK, will continue to investigate.

@markwallace-microsoft markwallace-microsoft moved this from Sprint: In Review to Sprint: In Progress in Semantic Kernel Dec 13, 2024
@GregorBiswanger
Copy link
Author

@markwallace-microsoft In combination with IChatCompletionService and PromptExecutionSettings instead of AzureAIInferencePromptExecutionSettings, it works without duplicating tool entries in the request. The 'Function Call' then works perfectly.

@markwallace-microsoft markwallace-microsoft moved this from Sprint: In Progress to Sprint: In Review in Semantic Kernel Dec 16, 2024
github-merge-queue bot pushed a commit that referenced this issue Jan 6, 2025
### Motivation and Context

- Closes #9933 

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [ ] The code builds clean without any errors or warnings
- [ ] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [ ] All unit tests pass, and I have added new tests where possible
- [ ] I didn't break anyone 😄
@markwallace-microsoft markwallace-microsoft moved this from Sprint: In Review to Sprint: Done in Semantic Kernel Jan 8, 2025
@bauann
Copy link

bauann commented Jan 15, 2025

Hi,
I am facing similar issue, I deployed 3 difference models in Azure AI (Machine Learning Studio) serverless endpoint.

  • Mistral-Nemo
  • Llama-3.3-70B-Instruct
  • Llama-3.2-90B-Vision-Instruct

And use SemanticKernel package 1.33.0 + Microsoft.SemanticKernel.Connectors.AzureAIInference 1.33.0-beta to test, The Mistral-Nemo model works perfect. But not in llama series models.

With llama-3.3-70B, When make a chatcomplecton call with tools, It responded with an incorrect result and not invoke any tools. no matter use GetChatMessageContentsAsync or GetStreamingChatMessageContentsAsync.

With llama-3.2-90B, When make a chatcomplecton call with tools, It throws an exception right away (error message in bwlow)

Azure.RequestFailedException: {"object":"error","message":"\"auto\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set","type":"BadRequestError","param":null,"code":400}
Status: 400 (Bad Request)
ErrorCode: Bad Request

Content:
{"error":{"code":"Bad Request","message":"{\"object\":\"error\",\"message\":\"\\\"auto\\\" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set\",\"type\":\"BadRequestError\",\"param\":null,\"code\":400}","status":400}}

Headers:
x-ms-rai-invoked: REDACTED
x-envoy-upstream-service-time: REDACTED
X-Request-ID: REDACTED
ms-azureml-model-error-reason: REDACTED
ms-azureml-model-error-statuscode: REDACTED
ms-azureml-model-time: REDACTED
azureml-destination-model-group: REDACTED
azureml-destination-region: REDACTED
azureml-destination-deployment: REDACTED
azureml-destination-endpoint: REDACTED
x-ms-client-request-id: 42bc2161-5a3f-4e88-8c9e-577390db941e
Request-Context: REDACTED
azureml-model-session: REDACTED
azureml-model-group: REDACTED
Date: Wed, 15 Jan 2025 05:46:51 GMT
Content-Length: 246
Content-Type: application/json

   at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
   at Azure.AI.Inference.ChatCompletionsClient.CompleteAsync(RequestContent content, String extraParams, RequestContext context)
   at Azure.AI.Inference.ChatCompletionsClient.CompleteAsync(ChatCompletionsOptions chatCompletionsOptions, CancellationToken cancellationToken)
   at Microsoft.Extensions.AI.AzureAIInferenceChatClient.CompleteAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
   at Microsoft.Extensions.AI.FunctionInvokingChatClient.CompleteAsync(IList`1 chatMessages, ChatOptions options, CancellationToken cancellationToken)
   at Microsoft.SemanticKernel.ChatCompletion.ChatClientChatCompletionService.GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings executionSettings, Kernel kernel, CancellationToken cancellationToken)

Is the AzureAIInference connector has issue with llama series models currently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working .NET Issue or Pull requests regarding .NET code
Projects
Status: Sprint: Done
5 participants