Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread_signal/2 throws an existence error for threads that terminated but are not yet joined #1236

Open
pmoura opened this issue Feb 9, 2024 · 11 comments

Comments

@pmoura
Copy link
Contributor

pmoura commented Feb 9, 2024

Consider:

$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.3.0-17-g4d781a64e)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- thread_create(true, _, [alias(t)]).
true.

?- thread_property(t, P).
P = id(3) ;
P = alias(t) ;
P = status(true) ;
P = detached(false) ;
P = debug(true) ;
P = engine(false) ;
false.

?- thread_signal(t, throw(e)).
ERROR: thread `t' does not exist
ERROR: In:
ERROR:   [12] thread_signal(t,throw(e))
ERROR:   [11] toplevel_call(user:user: ...) at /Users/pmoura/lib/swipl/boot/toplevel.pl:1317
?- thread_property(t, P).
P = id(3) ;
P = alias(t) ;
P = status(true) ;
P = detached(false) ;
P = debug(true) ;
P = engine(false) ;
false.

?- thread_join(t, S).
S = true.

?- thread_create((repeat,fail), _, [alias(w)]).
true.

?- thread_signal(w, throw(e)).
true.

?- thread_join(w, S).
S = exception(e).

The exception is arguably misleading and this behavior forces wrapping thread_signal/2 calls using catch/3 as a thread may terminate between checking that it's running and calling the predicate.

@pmoura
Copy link
Contributor Author

pmoura commented Feb 9, 2024

Same problem with thread_send_message/2:

$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 9.3.0-17-g4d781a64e)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.

For online help and background, visit https://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- thread_create(true, _, [alias(t)]).
true.

?- thread_send_message(t, foo).
ERROR: thread `t' does not exist
ERROR: In:
ERROR:   [12] thread_send_message(t,foo)
ERROR:   [11] toplevel_call(user:user: ...) at /Users/pmoura/lib/swipl/boot/toplevel.pl:1317

Again this behavior forces using a catch/3 wrapper.

@JanWielemaker
Copy link
Member

What else do you want? Existence is a bit misleading, but a completed thread is what Unix calls a zombie process: the thing is gone, but there is still an entry in the thread/process table that allows for join/wait. It is in no way capable of processing the signal or message. We could consider another exception (permission error?), but IMO that makes things worse as it would require catching two different exceptions. If the misleading error message is your (only) concern we could add a comment to the 2nd argument of the error term?

I'm also no fan of the requirement to use catch/3. I see little alternative though. It is a bit like opening a file. In fairly static environments testing the access first may be defensible, but in a dynamic environment you must use catch/3 because the file may disappear or change permissions between the two calls.

@pmoura
Copy link
Contributor Author

pmoura commented Feb 9, 2024

What else do you want? Existence is a bit misleading, but a completed thread is what Unix calls a zombie process: the thing is gone, but there is still an entry in the thread/process table that allows for join/wait. It is in no way capable of > I'm also no fan of the requirement to use catch/3. I see little alternative though. It is a bit like opening a file. In fairly static environments testing the access first may be defensible, but in a dynamic environment you must use catch/3 because the file may disappear or change permissions between the two calls.

If the thread message queue (the one used by thread_send_message/2) and the signal queue are only reclaimed when thread_join/2 is called, then there would be no exceptions and thus no need for a catch/3 wrapper to account for the often unpredictable cases where a thread terminates between checking that it's running and calling thread_signal/2 or thread_send_message/2. Of course, if the thread terminated by the time those calls are processed, they would be no-ops. At least in the particular case of thread_signal/2, which is used mainly to stop or debug a thread, that should not be an issue.

P.S. This implementation choice is found on (from my limited testing) in ECLiPSe and Trealla Prolog. It's also how it's implemented in LVM.

@JanWielemaker
Copy link
Member

In SWI-Prolog at least, the entire thread structure is cleared when the thread terminates. So, there is no place to deliver a signal or a message. There is also no point as it would not be processed anyway.

I agree that a signal intended to tear down the thread could be ignored if it is already dead. The only candidate for that seems thread_signal(Target, abort) though. Pretty much any other signal may have other intents. For thread messages the situation is a little more difficult as, while signals are always handled if the thread is still alive, thread messages may in general be handled or not and unhandled messages are silently discarded when the thread dies (possibly not a good idea as I think about it). The sender basically never knows unless some form of report-back is implemented. On the other hand, if we have a thread that is designed to process messages forever (a very common case) and it stops doing so due to a failure or exception it is quite nice to get an exception.

Do you have documentation from the other systems on how this is handled? I'm happy to discuss the topic with other developers.

@pmoura
Copy link
Contributor Author

pmoura commented Feb 9, 2024

In SWI-Prolog at least, the entire thread structure is cleared when the thread terminates.

How difficult would be to that to happen only for detached threads but postpone it for attached threads until they are joined? Also, what would be the expectation that this change in semantics/behavior would break existing applications?

So, there is no place to deliver a signal or a message. There is also no point as it would not be processed anyway.

Indeed they would be no-ops (as I mentioned above) but that would avoid the need of catch/3 wrappers.

I agree that a signal intended to tear down the thread could be ignored if it is already dead. The only candidate for that seems thread_signal(Target, abort) though. Pretty much any other signal may have other intents. For thread messages the situation is a little more difficult as, while signals are always handled if the thread is still alive, thread messages may in general be handled or not and unhandled messages are silently discarded when the thread dies (possibly not a good idea as I think about it). The sender basically never knows unless some form of report-back is implemented. On the other hand, if we have a thread that is designed to process messages forever (a very common case) and it stops doing so due to a failure or exception it is quite nice to get an exception.

A possible alternative in the last scenario would be to use thread_property/2 to check that the thread is still running. Not exactly the same thing, I agree.

Do you have documentation from the other systems on how this is handled? I'm happy to discuss the topic with other developers.

I don't think this level of implementation details is explicit in the documentation of other open-source systems. At least not that I could find in a quick search. I'm part of the team developing LVM, but this is a commercial system and its documentation is not (currently) publicly available.

A discussion between developers would be welcome. My idea (if I ever find the time) is to update the threads draft standardization proposal (which currently Trealla Prolog are using as a guide) and add a test set to the Logtalk distribution. It would be great to minimize the differences between systems for better portability of multi-threading applications.

@JanWielemaker
Copy link
Member

How difficult would be to that to happen only for detached threads but postpone it for attached threads until they are joined? Also, what would be the expectation that this change in semantics/behavior would break existing applications?

It is probably easier to silently ignore messages and signals when we detect that the thread is in a zombie state. I don't really expect that to break properly functioning applications.

I expect that silently ignoring signals and messages that cannot be delivered is more a cause of problems than a way to avoid them. Notably you typically send a message to a thread if you want it to be processed. For signals the story is a bit different. Most signals are for aborting or debugging. I have also used signals to actually make threads do something though. For sending messages we have an option list that we could use to avoid an error (like close/1). We do not have that for thread_signal/2. One could also consider a high level interface for aborting and joining a thread. The debug usage is mostly interactive and controlled by more high level utilities.

A discussion between developers would be welcome.

If you organize one, I'm happy to join. You've done a lot of good work for the standard and I still regret that didn't continue. The SWI-Prolog thread API evolved quite a bit since then.

@pmoura
Copy link
Contributor Author

pmoura commented Feb 9, 2024

Using an option in thread_get_message/3 and in a thread_signal/3 upcoming predicate to decide behavior when the thread is no longer running sounds like a good way forward without introducing backwards compatibility issues. The option could be named e.g. errors(Action) with the possible values for Action being throw, fail, succeed.

@JanWielemaker
Copy link
Member

The option could be named e.g. errors(Action) with the possible values for Action being throw, fail, succeed.

Or copy ISO close/2, which implements force(true) to ignore any error. I'm no fan, but if such a thing is acceptable to some relevant Prolog implementations, I'm happy with the compromise. One still would need to define what needs to happen if the thread is already joined or, if it is a detached thread, terminated and vanished completely.

@kamahen
Copy link
Member

kamahen commented Feb 9, 2024

My experience from ~20 years ago, using POSIX threads on a non-Unix real-time OS (VxWorks, IIRC) is that if you don't do things exactly right,(*) all kinds of weird things can happen -- and I don't see how (or why) SWI-Prolog should deal with those situations. There's only so much you can do when the underlying system is buggy or badly designed. (In the case of VxWorks, my recollection is that it had its own threading model and provided a POSIX API that was either not quite compliant or buggy or both.)
So, your proposal might fix the problem on one OS but not on another - and possibly might make things worse on another OS.
(Maybe the API for pthreads has improved over the years, but when I encountered it, I did not enjoy the experience.)

(*) Where "exactly right" was often undefined in the documentation.

@JanWielemaker
Copy link
Member

The implementation is not really a problem. Linux pthreads is rock solid. MacOS has a few tweaks I managed to work around. The Windows implementation has some limits one can work around mostly by using native Windows alternatives for some. NetBSD and OpenBSD had some flaws in the past, but seem stable now as well.

The simple question is what do do if you talk to a thread that terminated, but is not yet joined. It seems some systems silently ignore the signals and messages while SWI-Prolog raises and exception. I still think that is what should happen. The alternative is much harder. Checking it is still alive before sending a message is no guarantee it is alive when you send the message. I'm more tempted to add a warning similarly to detached threads not exiting cleanly for threads that have pending messages in their input queue when they are joined or (for detached threads) die.

@pmoura
Copy link
Contributor Author

pmoura commented Feb 10, 2024

From this discussion and my own experience, it seems clear that, depending on the application, we ideally want to either silently succeeding or throwing an exception when sending a message or a signal to a terminated (but not yet joined attached) thread. My preference goes to be able to select the desired behavior using an option. For systems like LVM and Trealla Prolog, where the implementation of multi-threading features is a work-in-progress, this is ideal time to sync on a common solution. I will draw the attention of ECLiPSe and YAP developers to this discussion. Thanks for all the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants