Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve exception handling in .NET isolated #3034

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

andystaples
Copy link
Contributor

@andystaples andystaples commented Feb 6, 2025

Improves exception handling in .NET isolated by serializing the exception details in the worker middleware, then deserializing them back in host middleware.
This may introduce breaking changes to the way that the Functions Host logs exceptions from Durable activities.
Also adds testing for error handling, activity retry, and activity timeout scenarios in dotnet-isolated

Resolves #2711

Pull request checklist

  • My changes do not require documentation changes
    • Otherwise: Documentation PR is ready to merge and referenced in pending_docs.md
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
    • Otherwise: Backport tracked by issue/PR #issue_or_pr
  • I have added all required tests (Unit tests, E2E tests)
  • My changes do not require any extra work to be leveraged by OutOfProc SDKs
    • Otherwise: That work is being tracked here: #issue_or_pr_in_each_sdk
  • My changes do not change the version of the WebJobs.Extensions.DurableTask package
    • Otherwise: major or minor version updates are reflected in /src/Worker.Extensions.DurableTask/AssemblyInfo.cs
  • My changes do not add EventIds to our EventSource logs
    • Otherwise: Ensure the EventIds are within the supported range in our existing Windows infrastructure. You may validate this with a deployed app's telemetry. You may also extend the range by completing a PR such as this one.
  • My changes should be added to v2.x branch.
    • Otherwise: This change applies exclusively to WebJobs.Extensions.DurableTask v3.x. It will be retained only in the dev and main branches and will not be merged into the v2.x branch.

- Serialize exception details in worker middleware
- Attempt to deserialize in host middleware
- Ensure inner exceptions + complex messages are preserved
@andystaples andystaples changed the title Add error handling and timeout tests Improve exception handling in .NET isolated Feb 6, 2025
test/e2e/Tests/Tests/ErrorHandlingTests.cs Outdated Show resolved Hide resolved
test/e2e/Tests/Tests/ErrorHandlingTests.cs Show resolved Hide resolved
test/e2e/Tests/Tests/ErrorHandlingTests.cs Outdated Show resolved Hide resolved
test/e2e/Tests/Tests/TimeoutTests.cs Show resolved Hide resolved
src/WebJobs.Extensions.DurableTask/OutOfProcMiddleware.cs Outdated Show resolved Hide resolved

public override string? StackTrace => this.FromException.StackTrace;

private static TaskFailureDetails? ExceptionToTaskFailureDetailsRecursive(Exception? fromException)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a shared version of this we can use? Or is it in some inaccessible place, so we have to copy it? I'm a tiny bit worried about keeping multiple copies up-to-date if we decide to add richer information to TaskFailureDetails (which is very likely).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a new TaskFailureDetailsConverter

}
catch (Exception)
{
// Apparently the exception message was not serialized by the worker middleware, continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to log a message saying what's in this comment? Will this be useful for debugging issues in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - the actual warning is logged using TraceHelper.ExtensionWarningEvent in CallActivityAsync based on whether the GetFailureDetails call reports successful deserialization of the new message.
If there is a better way to log this, please, let me know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also, a consideration: GetFailureDetails is used for all languages that use OutOfProcMiddleware, so I believe Java/others will now log this warning on every activity failure, as the worker extensions for these languages do not know to serialize the exception details in this way.
We may eventually want to update the other extensions in a similar way - happy to remove/comment this warning until we expect all languages to support exception serialization, or I can try to check the language and only warn for dotnet, or leave the warning in as-is, if we think that the warning will not be distracting for other languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update - added a check for dotnet isolated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the worker extensions for these languages do not know to serialize the exception details in this way.

Can you clarify what you mean here?

- Add tests for entity error handling
- Revert to synchronous worker middleware
- Log a warning if exception message not serialized
Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - just adding a few more comments/questions.


internal class DurableSerializationException : Exception
{
private Exception FromException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields should be lowercased and, whenever possible, should be readonly.

Suggested change
private Exception FromException;
private readonly Exception fromException;

[DurableClient] DurableTaskClient client,
FunctionContext executionContext)
{
ILogger logger = executionContext.GetLogger("RethrowActivityException_HttpStart");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:

Suggested change
ILogger logger = executionContext.GetLogger("RethrowActivityException_HttpStart");
ILogger logger = executionContext.GetLogger("CatchHttpStart");


public static class ActivityErrorHandling
{
private static ConcurrentDictionary<string, int> retryCount = new ConcurrentDictionary<string, int>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider naming this globalRetryCount to make it clear that it's a global (static) value.

for (int i = 0; i < 5; i++)
{
Thread.Sleep(1000);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of having a sleep in a for-loop vs. just doing Thread.Sleep(5000)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[out-of-proc] RetryContext.LastFailure incorrectly captures thrown exception
3 participants