Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton crashes with SIGSEGV #7938

Open
ctxqlxs opened this issue Jan 15, 2025 · 1 comment
Open

Triton crashes with SIGSEGV #7938

ctxqlxs opened this issue Jan 15, 2025 · 1 comment
Labels
crash Related to server crashes, segfaults, etc.

Comments

@ctxqlxs
Copy link

ctxqlxs commented Jan 15, 2025

Description
0x00007ff0d06cc4dc in triton::backend::python::SharedMemoryManager::Loadtriton::backend::python::StringShm (this=0x7fef84012020, handle=-4611686018427387901, unsafe=false) at /tmp/tritonbuild/python/src/shm_manager.h:158
warning: 158 /tmp/tritonbuild/python/src/shm_manager.h: No such file or directory
When used debug build the back trace looks like this:

(gdb) bt
#0  0x00007ff0d06cc4dc in triton::backend::python::SharedMemoryManager::Load<triton::backend::python::StringShm> (this=0x7fef84012020, handle=-4611686018427387901, unsafe=false) at /tmp/tritonbuild/python/src/shm_manager.h:158
#1  0x00007ff0d06cbbea in triton::backend::python::PbString::LoadFromSharedMemory (shm_pool=std::unique_ptr<triton::backend::python::SharedMemoryManager> = {...}, handle=-4611686018427387901) at /tmp/tritonbuild/python/src/pb_string.cc:69
#2  0x00007ff0d06c253e in triton::backend::python::InferRequest::LoadFromSharedMemory (shm_pool=std::unique_ptr<triton::backend::python::SharedMemoryManager> = {...}, request_handle=560304, open_cuda_handle=false, is_model_decoupled=0x0)
    at /tmp/tritonbuild/python/src/infer_request.cc:294
#3  0x00007ff0d05fb1d9 in triton::backend::python::ModelInstanceState::ExecuteBLSRequest (this=0x7fef84012110, ipc_message=std::shared_ptr<triton::backend::python::IPCMessage> (use count 2, weak count 0) = {...}, is_decoupled=false)
    at /tmp/tritonbuild/python/src/python_be.cc:596
#4  0x00007ff0d05fc762 in operator() (__closure=0x7fefd407a6d8) at /tmp/tritonbuild/python/src/python_be.cc:761
#5  0x00007ff0d060fa53 in std::__invoke_impl<void, triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/13/bits/invoke.h:61
#6  0x00007ff0d060f6bc in std::__invoke_r<void, triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:111
#7  0x00007ff0d060f177 in operator() (__closure=0x7fef21ff73f8) at /usr/include/c++/13/future:1491
#8  0x00007ff0d06109b0 in std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>, std::allocator<int>, void()>::_M_run()::<lambda()>, void>::operator()(void) const (this=0x7fef21ff7410) at /usr/include/c++/13/future:1432
#9  0x00007ff0d061068b in std::__invoke_impl<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>, std::allocator<int>, void()>::_M_run()::<lambda()>, void>&>(std::__invoke_other, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>, std::allocator<int>, void()>::_M_run()::<lambda()>, void> &) (__f=...)
    at /usr/include/c++/13/bits/invoke.h:61
#10 0x00007ff0d0610014 in std::__invoke_r<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>, std::allocator<int>, void()>::_M_run()::<lambda()>, void>&>(std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>, std::allocator<int>, void()>::_M_run()::<lambda()>, void> &) (__fn=...) at /usr/include/c++/13/bits/invoke.h:116
#11 0x00007ff0d060faf3 in std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>(), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, std::__future_base::_Task_state<triton::backend::python::ModelInstanceState::StubToParentMQMonitor()::<lambda()>, std::allocator<int>, void()>::_M_run()::<lambda()>, void> >::_M_invoke(const std::_Any_data &) (__functor=...)
    at /usr/include/c++/13/bits/std_function.h:291
#12 0x00007ff0d0625842 in std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const (this=0x7fef21ff7410) at /usr/include/c++/13/bits/std_function.h:591
#13 0x00007ff0d061a1aa in std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*) (this=0x7fefd407a6b0, __f=0x7fef21ff7410, __did_set=0x7fef21ff735f)
    at /usr/include/c++/13/future:589
#14 0x00007ff0d063ccb9 in std::__invoke_impl<void, void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(std::__invoke_memfun_deref, void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (
    __f=@0x7fef21ff73a0: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x7ff0d061a170 <std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>, __t=@0x7fef21ff7370: 0x7fefd407a6b0) at /usr/include/c++/13/bits/invoke.h:74
#15 0x00007ff0d063199b in std::__invoke<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*>(void (std::__future_base::_State_baseV2::*&&)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*), std::__future_base::_State_baseV2*&&, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*&&, bool*&&) (
    __fn=@0x7fef21ff73a0: (void (std::__future_base::_State_baseV2::*)(std::__future_base::_State_baseV2 * const, std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter>()> *, bool *)) 0x7ff0d061a170 <std::__future_base:--Type <RET> for more, q to quit, c to continue without paging--

Triton Information
What version of Triton are you using?
v24.12
Are you using the Triton container or did you build it yourself?
use the official but crash, so we rebuilt it. python3 build.py --build-type=Debug ... scripts from the triton repo

To Reproduce
Steps to reproduce the behavior.

Expected behavior
Does not crash.

@rmccorm4 rmccorm4 added the crash Related to server crashes, segfaults, etc. label Jan 15, 2025
@rmccorm4
Copy link
Contributor

CC @krishung5 @kthui for viz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Related to server crashes, segfaults, etc.
Development

No branches or pull requests

2 participants