-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault when updating from 3.13.0 to 3.13.1 while Python is running #128341
Comments
Starting Python 3.13.1, downgrading it to 3.13.0, importing |
This is an interesting bug. My first guess would be that there's some sort of race happening with a C extension's shared object file, but A few other tricks for debugging that would be helpful:
cc @encukou as a Fedora expert. |
From the backtrack, this is inseparable from
ref: https://github.com/python/cpython/blob/main/Modules/_datetimemodule.c cpython/Modules/_datetimemodule.c Line 122 in c810ed7
|
This issue seems to be caused by the changes introduced in commit d82a7ba (gh-120004). The address defined by the INTERP_KEY macro, which is used for the module state, appears to become invalid after the shared library for the This hypothesis can be confirmed by statically compiling the *static*
_datetime _datetimemodule.c |
Thanks for the ping!
That's where user expectations clash with what we do. Standard library modules use an internal ABI, which doesn't need to be compatible. But, given:
we probably want to change that. If we want to fix this (rather than write it off as unsupported), I'd recommend:
To me that seems like the correct thing to do, but since this would bake the modules into libpython or the main executable, making it bigger, it's bound to get pushback from someone. |
I think the more ideal solution would be to instead ensure private ABI compatibility with dynamically loaded extensions between patch versions (we need the private API for a lot of extensions, and making them all statically linked for this single use case seems like a bad idea). But I thought we already had a |
The crash is indeed from |
And this Setup.local makes swapping Python 3.13.0 for 3.13.1 or back while it runs import the module sucesfully.
I've tried importing all modules from the dynload directory. I get:
Everything else seems OK. So the
Which is still better than a segfault and the datetime module will use the Python implementation from So I might as well do this. However, I agree with @ZeroIntensity -- this should not happen on a patch version update. |
Should I open a separate issue for |
Maybe not, we should first solve the abnormality caused by this problem, that say which is the code line I mentioned above. |
That would severely limit our ability to fix bugs. If the ABI is stable, it's usually trivial to keep the API stable as well. (Not always, but very often). So, essentially we'd need to treat internal API as Unstable.
We do, but if it detects a change, the Release Manager can still approve the PR. And RMs generally do that for internal ABI.
No. If a module imports successfully, its functions could still crash this way. |
A few technical details about the actual root cause. It appears that a new struct field was added between Python 3.13.0 and 3.13.1: $ git diff v3.13.0 v3.13.1 Include/internal/pycore_global_strings.h
diff --git a/Include/internal/pycore_global_strings.h b/Include/internal/pycore_global_strings.h
index a5436b6d689..cad2d1a8d22 100644
--- a/Include/internal/pycore_global_strings.h
+++ b/Include/internal/pycore_global_strings.h
@@ -47,6 +47,7 @@ struct _Py_global_strings {
STRUCT_FOR_STR(json_decoder, "json.decoder")
STRUCT_FOR_STR(kwdefaults, ".kwdefaults")
STRUCT_FOR_STR(list_err, "list index out of range")
+ STRUCT_FOR_STR(str_replace_inf, "1e309")
STRUCT_FOR_STR(type_params, ".type_params")
STRUCT_FOR_STR(utf_8, "utf-8")
} literals; This addition changed the offset of the
$ gdb -batch -ex "disassemble/mr get_current_module" Modules/_datetime.cpython-313d-x86_64-linux-gnu.so
...
124 if (PyDict_GetItemRef(dict, INTERP_KEY, &ref) < 0) {
0x000000000000860f <+60>: 48 8d 55 e0 lea rdx,[rbp-0x20]
0x0000000000008613 <+64>: 48 8b 05 2e 39 01 00 mov rax,QWORD PTR [rip+0x1392e] # 0x1bf48
0x000000000000861a <+71>: 48 8d b0 60 01 01 00 lea rsi,[rax+0x10160] # <_PyRuntime+0x10160> (rsi -> arg2 = INTERP_KEY)
0x0000000000008621 <+78>: e8 6a e1 ff ff call 0x6790 <PyDict_GetItemRef@plt>
0x0000000000008626 <+83>: 85 c0 test eax,eax
0x0000000000008628 <+85>: 0f 88 90 00 00 00 js 0x86be <get_current_module+235>
...
$ gdb -batch -ex "disassemble/mr get_current_module" Modules/_datetime.cpython-313d-x86_64-linux-gnu.so
...
124 if (PyDict_GetItemRef(dict, INTERP_KEY, &ref) < 0) {
0x000000000000864f <+60>: 48 8d 55 e0 lea rdx,[rbp-0x20]
0x0000000000008653 <+64>: 48 8b 05 ee 38 01 00 mov rax,QWORD PTR [rip+0x138ee] # 0x1bf48
0x000000000000865a <+71>: 48 8d b0 90 01 01 00 lea rsi,[rax+0x10190] # <_PyRuntime+0x10190> (rsi -> arg2 = INTERP_KEY)
0x0000000000008661 <+78>: e8 3a e1 ff ff call 0x67a0 <PyDict_GetItemRef@plt>
0x0000000000008666 <+83>: 85 c0 test eax,eax
0x0000000000008668 <+85>: 0f 88 90 00 00 00 js 0x86fe <get_current_module+235>
... We can also confirm this with a GDB session: (gdb) n
124 if (PyDict_GetItemRef(dict, INTERP_KEY, &ref) < 0) {
(gdb) s
PyDict_GetItemRef (op={}, key=<unknown at remote 0x7ffff7f04370>, result=0x7fffffff1990) at /usr/src/debug/python3.13-3.13.0-1.fc41.x86_64/Objects/dictobject.c:2336
2336 if (!PyDict_Check(op)) {
(gdb) x &_PyRuntime.static_objects.singletons.strings.identifiers._py_cached_datetime_module
0x7ffff7f04340 <_PyRuntime+65888>: 0xffffffff
(gdb) x key
0x7ffff7f04370 <_PyRuntime+65936>: 0x74657461
(gdb) p/x 0x7ffff7f04370 - 0x7ffff7f04340
$2 = 0x30 The difference is the size of the new field: (gdb) p/x sizeof _PyRuntime.static_objects.singletons.strings.literals._py_str_replace_inf
$4 = 0x30 After the RPM upgrade, the mapped files for the running process (shared libraries, etc.) are deleted (marked as Thus, the issue arises because the |
Another solution would be to create a version for the internal C API and check this version when an extension module is imported. |
Oh! I dismissed this at first, but thinking about it more, it might be workable. It ties in with another design I'm doing right now: checks for stable ABI. Please give me about 2 weeks come up with a PEP draft :) |
Bug report
Bug description:
This was reported to us in Fedora by a user and the reproducer was simplified by me. This is probably a hard problem to fix.
Anyway:
email.parser
(EDIT: or_datetime
)E.g. on Fedora 41:
sudo dnf --repo=fedora install python3
), runpython3
, press Ctrl+zsudo dnf upgrade python3
)fg
andimport email.parser
This Fedora reproducer benefits form the fact that Python 3.13.0 is in the
fedora
dnf repo and 3.13.1+ is in theupdates
repo. On different Fedora versions, you might need to search for the right build in https://koji.fedoraproject.org/koji/search?match=exact&type=package&terms=python3.13To get a reasonable backtrace from gdb, install python3-debug and the debuginfo packages first for the old Python:
sudo dnf --repo=fedora install python3-debug
sudo dnf --repo=fedora debuginfo-install python3-debug
gdb python3-debug
,run
, Ctrl+z, Ctrl+zsudo dnf upgrade python3-debug
fg
,fg
,fg
,import email.parser
It is possible other modules are affected as well.
User assumptions
ImportError
) rather than segfault.CPython versions tested on:
3.13
Operating systems tested on:
Linux
The text was updated successfully, but these errors were encountered: