Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokio::signals::windows - are windows signals correctly handled? #7039

Open
Astlaan opened this issue Dec 16, 2024 · 21 comments · May be fixed by #7122
Open

Tokio::signals::windows - are windows signals correctly handled? #7039

Astlaan opened this issue Dec 16, 2024 · 21 comments · May be fixed by #7122
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-signal Module: tokio/signal

Comments

@Astlaan
Copy link

Astlaan commented Dec 16, 2024

Version
v1.42.0

Platform
Windows

Description
The issue was been discussed in another tokio-dependent crate.
Full discussion here: Finomnis/tokio-graceful-shutdown#94 (comment)

In tokio, in this section:

unsafe extern "system" fn handler(ty: u32) -> BOOL {

unsafe extern "system" fn handler(ty: u32) -> BOOL {
    let globals = globals();
    globals.record_event(ty as EventId);

    // According to https://docs.microsoft.com/en-us/windows/console/handlerroutine
    // the handler routine is always invoked in a new thread, thus we don't
    // have the same restrictions as in Unix signal handlers, meaning we can
    // go ahead and perform the broadcast here.
    if globals.broadcast() {
        1
    } else {
        // No one is listening for this notification any more
        // let the OS fire the next (possibly the default) handler.
        0
    }
}

The HandlerRoutine is will defined. It later gets registered via the SetConsoleCtrlHandler. This gets done in the tokio code here.

According to:
HandlerRoutine reference

It seems that if the handler returns 1, then the console understands this as the handler having handled the graceful closure of the program, and windows terminates the program. If 0 is passed, "next handler function in the list of handlers for this process is used". SetConsoleCtrlHandler reference:

When a console process receives any of the control signals, its handler functions are called on a last-registered, first-called basis until one of the handlers returns TRUE. If none of the handlers returns TRUE, the default handler is called.

But in the tokio-defined handler function, broadcast() is called, which "Returns true if an event was delivered to at least one listener". But then broadcast() evaluates to true, the handler function returns true, telling windows that the defined handler handled the signal.

  • If we are dealing with a CtrlC signal, I guess this should be fine. The handler informs Windows that the signal has been handled, and windows doesn't call the default handler, which would terminate the program.
  • However, if we are dealing with a with something like a CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT, according to the HandlerRoutine reference, Windows immediately terminates the process once a true return is received (unlike CTRL_Cevents

It seems that the way tokio implements it doesn't even give a change for the program to do anything with the signal in the case of CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT events, since it broadcasts it to the listeners, and if they exist, the handler function returns a true, resulting in Windows terminating the program right away?

Or am I misinterpreting something in the Tokio code?

How to fix?

If I interpreted the Tokio mechanism correctly, then I guess the way to fix would be something like making the handler function wait for the program to complete.

One of the limiting factors is that, when the console emits CTRL_CLOSE_EVENT, CTRL_LOGOFF_EVENT, or CTRL_SHUTDOWN_EVENT, Windows grants a limited time for the program to clean itself: Timeouts.

The CTRL_CLOSE_EVENT timeout is 5 s, for example. If the program is not done cleaning after these 5 seconds, it will be forcefully terminated.

Possible solutions:

  1. Make the handler return false always, telling the user to register another handler function, with the cleanup he wants to do. Also tell the user that his handler function must be defined BEFORE the Tokio signal is registered, if the signal is to be of any use as well (due to the last-registered, first-called).
  2. Put a sleep on Tokio's handler function, so that that function never returns, and the program always has the full timeout time to clean after itself. Sleep works because the handler is run in another thread, so wouldn't affect the main thread.
  3. Some smarter method where the handler stops on some synchronization flag, that then can be switched by the main thread when the cleanup is done, and then lets the handler function proceed and finalize.

EDIT: possibly it is a good idea to always return False, regardless of the fix above. This will allow the user to keep defining custom CtrlHandler functions to be long alongside tokio's, if he so wishes (with the ordering warning mentioned in (1). This would invalidate approach (2) however, which won't give time for other handlers to run.

(1) is probably the most hacky. (2) would integrate the most seamlessly, however, preventing the user from running their own ctrl handlers. (3) is probably the most solid, but would imply having a synchronization variable, and some kind of action from the program that it is done cleaning itself. This would however result in code that is not cross-platform in the sense that this synchronization is not needed with Unix signals.

@Astlaan Astlaan added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Dec 16, 2024
@Darksonn Darksonn added the M-signal Module: tokio/signal label Dec 16, 2024
@Darksonn
Copy link
Contributor

There was a previous issue on this: #6735.

cc @ipetkov

@ipetkov
Copy link
Member

ipetkov commented Dec 17, 2024

Put a sleep on Tokio's handler function, so that that function never returns, and the program always has the full timeout time to clean after itself. Sleep works because the handler is run in another thread, so wouldn't affect the main thread.

I don't think this design would be sufficient: whatever sleep time we choose, there will always be applications for whom the time is either too long or too short (and making it configurable would also be a pain to use).

Some smarter method where the handler stops on some synchronization flag, that then can be switched by the main thread when the cleanup is done, and then lets the handler function proceed and finalize.

Perhaps the best approach would be to:

  • Deprecate the current APIs
  • Introduce v2 versions of the APIs which block the handler until the event object received through the channel is dropped. Then a consumer can control exactly when it is safe to bring the process down

Though the change would be subtle enough (e.g. easy to drop the event "handle" if it doesn't have any obvious use) that I think it warrants a new API rather than trying to change the existing ones and cause confusion

@Astlaan
Copy link
Author

Astlaan commented Dec 17, 2024

@ipetkov

I don't think this design would be sufficient: whatever sleep time we choose, there will always be applications for whom the time is either too long or too short (and making it configurable would also be a pain to use).

Well at most Windows grants 5s. Even if you make a 5s sleep on the handler, when you click the X button the console still closes immediately and the process keeps running solely on the background. So for a user perspective I think there wouldn't be much difference. The user expects the console to close right away and that's still what happens, regardless of blocking the handler for 5s or not. But I agree that it is not the most elegant solution, and would likely prevent the developer from choosing to define their own handlers via winapi, which probably is not good policy.

Perhaps the best approach would be to:
Deprecate the current APIs
Introduce v2 versions of the APIs which block the handler until the event object received through the channel is dropped. Then a consumer can control exactly when it is safe to bring the process down
Though the change would be subtle enough (e.g. easy to drop the event "handle" if it doesn't have any obvious use) that I think it warrants a new API rather than trying to change the existing ones and cause confusion

But are you proposing OS-specific methods, or a cross platform one? On one hand it would be nice to have a cross-platform method. On the other, that would require slightly complicating the procedure for Unix users, since they had no handler they had to drop in the first place, due to the different mechanism in Unix.

@ipetkov
Copy link
Member

ipetkov commented Dec 19, 2024

But are you proposing OS-specific methods, or a cross platform one?

OS (Windows) specific methods. As in we should mark the current versions of tokio::signal::windows::* as deprecated and introduce new ones that are designed for giving the caller control on when the process goes down.

I would avoid trying to design a cross-platform API since the Windows and Unix designs are just too different. If we ever come up with an elegant cross-platform solution we can always introduce it later, but trying to fit it all up front might not lead to success

@Finomnis
Copy link
Contributor

Finomnis commented Dec 20, 2024

which block the handler until the event object received through the channel is dropped.

I think I could work with this. (maintainer of tokio-graceful-shutdown)
It will require a rewrite of the OS signal code in my crate, but it feels like it should work.

@Finomnis
Copy link
Contributor

Finomnis commented Dec 21, 2024

@Darksonn After reading through this again, I think that a wait wouldn't be bad, actually. I would even go so far as to say that a wait(forever) would be a good idea; assuming that the thread gets automatically terminated at the end of main(). Then, the signal would never be resolved, and the actual response of the program would be to shut down. That would align with the behavior in Linux.

Again, if the thread gets reliably terminated after the main function exited. Not sure if there's still issues where exiting main does not do that properly; I remember some situations where that was a problem.

@Darksonn
Copy link
Contributor

Seems like a plausible solution for the existing function.

@Finomnis
Copy link
Contributor

Finomnis commented Jan 10, 2025

Any news on this?

I would give it a try, but I'm unsure how to implement his.

Theoretically, it should kind of be sufficient to add a 'sleep(forever)' in the handler() function (for some of the signals), but there are two problems:

  • This will break all unit tests in the same file, as they will now never finish
  • Programs that exit by simply returning from main() might get stuck, if the handler thread that windows created was not demonized. So we maybe need a mechanism of returning from handler once the related tokio runtime was exited; unsure how to approach this. The current approach has no way to easily retrofit something like this into it (globals() doesn't even know a tokio runtime exists)

@Darksonn Maybe you can give me some hints, in case I'm missing something trivial.

@Finomnis
Copy link
Contributor

@Astlaan What happens if the signal handler calls sleep(100 seconds), but everything else returns after 1s? (Specifically, main())

Will the program then still end after 1s, or does the signal handler itself keep the program alive, and the 5s timeout happens?

@Astlaan
Copy link
Author

Astlaan commented Jan 16, 2025

@Finomnis Not sure, I didn't test that, because the tokio signal was not working.
But I would assume the signal handler keeps the program alive, I had the impression it was design in such a way that the cleanup might be done in the ctrl handler function.

Unless you call a process::exit in one of the threads, I think the process keeps running till all threads terminate.

But didn't test this to know for sure.

@Darksonn
Copy link
Contributor

Regarding the tests, you can probably move the ones that need to sleep to a background test to get them to pass.

Programs that exit by simply returning from main() might get stuck, if the handler thread that windows created was not demonized. So we maybe need a mechanism of returning from handler once the related tokio runtime was exited; unsure how to approach this. The current approach has no way to easily retrofit something like this into it (globals() doesn't even know a tokio runtime exists)

Please tell me more about this. Which thread does the callback run on?

@Finomnis
Copy link
Contributor

@Darksonn according to the windows api, on a newly created thread spawned by windows.

@Darksonn
Copy link
Contributor

If it runs on a newly created thread, I don't see any problem. I would expect the program to exit when returning from main, though of course we should double check.

@Finomnis
Copy link
Contributor

@Darksonn Are threads killed by default after return of main? I thought you need sys::exit for that, but maybe I misremember

@Darksonn
Copy link
Contributor

Rust kills all threads when you exit, yes.

In the specific case of #[tokio::main], Tokio will wait for Tokio threads to exit before it returns from the true Rust main function. But once all Tokio threads (of which the signal thread is not one) have exited, Tokio returns from main, which in turn causes Rust to exit the process.

@Finomnis
Copy link
Contributor

@Darksonn will in that case the fix should be as easy as an infinite sleep in the handler functions.

@Darksonn
Copy link
Contributor

I mean, please double check that it actually works, but yes I believe that would work.

@Finomnis
Copy link
Contributor

@Darksonn Can confirm it behaves as you believe.

@Finomnis
Copy link
Contributor

@Astlaan Would you mind checking if #7122 fixes your problem and works as intended?

To do so, simply add the following at the end of your Cargo.toml:

[patch.crates-io]
tokio = { git = "https://github.com/Finomnis/tokio.git", branch = "fix_windows_signals" }

@Finomnis
Copy link
Contributor

Regarding the tests, you can probably move the ones that need to sleep to a background test to get them to pass.

Would you mind elaborating how I do that? :)

@Finomnis
Copy link
Contributor

Finomnis commented Jan 25, 2025

#7122 Seems to work if I run it in CMD and then exit the command shell. However, it does not work when I click 'End Task' in the Task manager, although it's supposed to work according to the documentation of the CTRL_CLOSE event.

I suspect windows is trying to send WM_CLOSE instead of CTRL_CLOSE and I don't know how to instruct Rust to prevent that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-signal Module: tokio/signal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants