Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when using _Py_DumpTracebackThreads #128400

Closed
nascheme opened this issue Jan 2, 2025 · 9 comments
Closed

Crash when using _Py_DumpTracebackThreads #128400

nascheme opened this issue Jan 2, 2025 · 9 comments
Labels
topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@nascheme
Copy link
Member

nascheme commented Jan 2, 2025

Crash report

What happened?

# triggering program
# based on test.test_faulthandler.FaultHandlerTests.test_dump_traceback_threads

import faulthandler
from threading import Thread, Event


class Waiter(Thread):

    def __init__(self):
        Thread.__init__(self)
        self.running = Event()
        self.stop = Event()

    def run(self):
        self.running.set()
        self.stop.wait()


def main():
    for i in range(100):
        waiter = Waiter()
        waiter.start()
        waiter.running.wait()
        faulthandler.dump_traceback(all_threads=True)
        waiter.stop.set()
        waiter.join()

if __name__ == '__main__':
    main()

This will case the interpreter to segfault if built with --disable-gil --with-pydebug. I've bisected it to this commit:

b2afe2a: gh-123924

Using gdb:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
_PyFrame_GetCode (f=f@entry=0x7ffff7fb32d8) at ./Include/internal/pycore_frame.h:83
83	    assert(PyCode_Check(executable));
(gdb) p executable 
$7 = 0x0
(gdb) bt
#0  _PyFrame_GetCode (f=f@entry=0x7ffff7fb32d8) at ./Include/internal/pycore_frame.h:83
#1  PyUnstable_InterpreterFrame_GetLine (frame=frame@entry=0x7ffff7fb32d8) at Python/frame.c:149
#2  0x000055555590e9eb in dump_frame (fd=2, frame=0x7ffff7fb32d8) at Python/traceback.c:905
#3  dump_traceback (fd=fd@entry=2, tstate=tstate@entry=0x555555cef140, write_header=write_header@entry=0) at Python/traceback.c:974
#4  0x000055555590ed1c in _Py_DumpTracebackThreads (fd=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0x555555c5dea8 <_PyRuntime+359528>) at Python/traceback.c:1090
#5  0x0000555555924489 in faulthandler_dump_traceback_py (self=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at ./Modules/faulthandler.c:240
#6  0x00005555556feae1 in cfunction_call (func=func@entry=<built-in method dump_traceback of module object at remote 0x20000778900>, args=args@entry=(), kwargs=kwargs@entry={'all_threads': True})
[...]

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.14.0a0 experimental free-threading build (bisect/bad:b2afe2aae48, Jan 1 2025, 17:09:16) [Clang 19.1.6 (++20241217105838+657e03f8625c-1exp120241217105944.74)]

Linked PRs

@nascheme nascheme added type-crash A hard crash of the interpreter, possibly with a core dump topic-free-threading labels Jan 2, 2025
@ZeroIntensity
Copy link
Member

It's probably because accessing fields on another thread state isn't very safe, especially on free-threading. Maybe we should just make a stop-the-world pause for manual invocation of faulthandler?

@colesbury
Copy link
Contributor

Maybe we should just make a stop-the-world pause for manual invocation of faulthandler?

Yes, I think that would make sense.

@colesbury
Copy link
Contributor

@ZeroIntensity - if you're interested in a related follow-up thread safety issue, I think it'd be helpful to make faulthandler only dump the current thread's stack when called from a signal handler. Basically, I think we should ignore all_threads=True in the free threading build because accessing other thread's stacks while they're running is likely to crash. It's not exactly thread-safe in the default build either, but it mostly works whereas in the free threading build it frequently crashes.

@ZeroIntensity
Copy link
Member

Sure, I can do that.

@dgrisby
Copy link
Contributor

dgrisby commented Jan 2, 2025

Even with the GIL, faulthandler is not safe to use in a multi-threaded process: #116008

@colesbury
Copy link
Contributor

@dgrisby, yes, but there's a meaningful difference between "mostly works" and "mostly crashes". We'd like to get to the point where faulthandler works about as reliably in the free threading build as it does in the GIL-enabled build, even if that still has bugs.

@colesbury
Copy link
Contributor

This is fixed now

@dgrisby
Copy link
Contributor

dgrisby commented Jan 3, 2025

@dgrisby, yes, but there's a meaningful difference between "mostly works" and "mostly crashes". We'd like to get to the point where faulthandler works about as reliably in the free threading build as it does in the GIL-enabled build, even if that still has bugs.

If that meets your use case, I suppose. The environment I work in, we have long-lived services that must not crash. For us, "mostly works" is equivalent to "too dangerous to use".

@colesbury
Copy link
Contributor

The intention of my comment was to explain why we are bothering with fixes to faulthandler given it's limitations. I'm not suggesting that you use faulthandler.

WolframAlph pushed a commit to WolframAlph/cpython that referenced this issue Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants