Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when sending release message in SharedBufferFrame destructor #289

Open
GDYendell opened this issue Oct 11, 2021 · 0 comments

Comments

@GDYendell
Copy link
Collaborator

GDYendell commented Oct 11, 2021

Edit: Actually, we have seen this once before, in February 2021. Also on I03.

This happened on I03 recently. I think it must be very rare, as this code has not changed for a long time now and we have never seen it before. I expect it will be very difficult to reproduce, but perhaps we can review the code and consider where it might be possible to go wrong.

There are a couple of issues with the same error (zeromq/libzmq#4233)

Assertion failed: refs_ >= 0

but the suggestion is that it is user error that is causing it, which seems most likely.

The frame receiver was then missing a memory buffer until restarted.

[Sat Oct  9 19:07:29 2021] 19:07:29,647 FP.Acquisition INFO  - Closing file /dls/i03/data/2021/nt23570-97/xraycentring/auto/Fab58/Fab58_H11_1_X1/Fab58_H11_1_X1_1_000001.h5
[Sat Oct  9 19:07:29 2021]Assertion failed: refs_ >= 0 (src/msg.cpp:337)
[Sat Oct  9 19:07:29 2021]Caught signal 6 (SIGABRT)
[Sat Oct  9 19:07:29 2021]stack trace:
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(OdinData::print_stack_trace(_IO_FILE*, unsigned int) 0xca)[0x499946]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(OdinData::abort_handler(int, siginfo_t*, void*) 0xc6)[0x499a74]
[Sat Oct  9 19:07:29 2021]/lib64/libpthread.so.0( 0xf630)[0x7fd685511630]
[Sat Oct  9 19:07:29 2021]/lib64/libc.so.6(gsignal 0x37)[0x7fd68453e387]
[Sat Oct  9 19:07:29 2021]/lib64/libc.so.6(abort 0x148)[0x7fd68453fa78]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x20709)[0x7fd686262709]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x29f0a)[0x7fd68626bf0a]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x1f8a3)[0x7fd6862618a3]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x1fa22)[0x7fd686261a22]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x56f61)[0x7fd686298f61]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x42fd1)[0x7fd686284fd1]
[Sat Oct  9 19:07:29 2021]/lib64/libzmq.so.5( 0x59a8c)[0x7fd68629ba8c]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libOdinData.so(zmq::socket_t::send(zmq::message_t&, int) 0x2f)[0x7fd6857f41fb]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libOdinData.so(OdinData::IpcChannel::send(char const*, int, std::string const&) 0x93)[0x7fd6857f32dd]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libFrameProcessor.so(FrameProcessor::SharedBufferFrame::~SharedBufferFrame() 0x154)[0x7fd687d611c8]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(void boost::checked_delete<FrameProcessor::SharedBufferFrame>(FrameProcessor::SharedBufferFrame*) 0x1e)[0x4978f7]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::detail::sp_counted_impl_p<FrameProcessor::SharedBufferFrame>::dispose() 0x1c)[0x499466]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::detail::sp_counted_base::release() 0x42)[0x475784]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::detail::shared_count::~shared_count() 0x27)[0x475847]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::shared_ptr<FrameProcessor::Frame>::~shared_ptr() 0x1c)[0x4949b4]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libFrameProcessor.so(FrameProcessor::IFrameCallback::workerTask() 0xab)[0x7fd687d64403]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libFrameProcessor.so(boost::_mfi::mf0<void, FrameProcessor::IFrameCallback>::operator()(FrameProcessor::IFrameCallback*) const 0x65)[0x7fd687d68c13]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libFrameProcessor.so(void boost::_bi::list1<boost::_bi::value<FrameProcessor::IFrameCallback*> >::operator()<boost::_mfi::mf0<void, FrameProcessor::IFrameCallback>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, FrameProcessor::IFrameCallback>&, boost::_bi::list0&, int) 0x4a)[0x7fd687d68b76]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libFrameProcessor.so(boost::_bi::bind_t<void, boost::_mfi::mf0<void, FrameProcessor::IFrameCallback>, boost::_bi::list1<boost::_bi::value<FrameProcessor::IFrameCallback*> > >::operator()() 0x39)[0x7fd687d68b25]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libFrameProcessor.so(boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, FrameProcessor::IFrameCallback>, boost::_bi::list1<boost::_bi::value<FrameProcessor::IFrameCallback*> > > >::run() 0x1e)[0x7fd687d68aa8]
[Sat Oct  9 19:07:29 2021]/lib64/libboost_thread-mt.so.1.53.0( 0xd1da)[0x7fd686fb81da]
[Sat Oct  9 19:07:29 2021]/lib64/libpthread.so.0( 0x7ea5)[0x7fd685509ea5]
[Sat Oct  9 19:07:29 2021]/lib64/libc.so.6(clone 0x6d)[0x7fd6846069fd]
[Sat Oct  9 19:07:29 2021]terminate called after throwing an instance of 'OdinData::IpcReactorException'
[Sat Oct  9 19:07:29 2021]  what():  IpcReactor error while polling: Context was terminated
[Sat Oct  9 19:07:29 2021]Caught signal 6 (SIGABRT)
[Sat Oct  9 19:07:29 2021]stack trace:
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(OdinData::print_stack_trace(_IO_FILE*, unsigned int) 0xca)[0x499946]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(OdinData::abort_handler(int, siginfo_t*, void*) 0xc6)[0x499a74]
[Sat Oct  9 19:07:29 2021]/lib64/libpthread.so.0( 0xf630)[0x7fd685511630]
[Sat Oct  9 19:07:29 2021]/lib64/libc.so.6(gsignal 0x37)[0x7fd68453e387]
[Sat Oct  9 19:07:29 2021]/lib64/libc.so.6(abort 0x148)[0x7fd68453fa78]
[Sat Oct  9 19:07:29 2021]/lib64/libstdc  .so.6(__gnu_cxx::__verbose_terminate_handler() 0x165)[0x7fd684e4ea95]
[Sat Oct  9 19:07:29 2021]/lib64/libstdc  .so.6( 0x5ea06)[0x7fd684e4ca06]
[Sat Oct  9 19:07:29 2021]/lib64/libstdc  .so.6( 0x5ea33)[0x7fd684e4ca33]
[Sat Oct  9 19:07:29 2021]/lib64/libstdc  .so.6( 0x5ec53)[0x7fd684e4cc53]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/lib/libOdinData.so(OdinData::IpcReactor::run() 0x39f)[0x7fd68581f4ab]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(FrameProcessor::FrameProcessorController::runIpcService() 0x247)[0x47427d]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::_mfi::mf0<void, FrameProcessor::FrameProcessorController>::operator()(FrameProcessor::FrameProcessorController*) const 0x65)[0x48fc4f]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(void boost::_bi::list1<boost::_bi::value<FrameProcessor::FrameProcessorController*> >::operator()<boost::_mfi::mf0<void, FrameProcessor::FrameProcessorController>, boost::_bi::list0>(boost::_bi::type<void>, boost::_mfi::mf0<void, FrameProcessor::FrameProcessorController>&, boost::_bi::list0&, int) 0x4a)[0x48c742]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::_bi::bind_t<void, boost::_mfi::mf0<void, FrameProcessor::FrameProcessorController>, boost::_bi::list1<boost::_bi::value<FrameProcessor::FrameProcessorController*> > >::operator()() 0x39)[0x48b90b]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf0<void, FrameProcessor::FrameProcessorController>, boost::_bi::list1<boost::_bi::value<FrameProcessor::FrameProcessorController*> > > >::run() 0x1e)[0x490e64]
[Sat Oct  9 19:07:29 2021]/lib64/libboost_thread-mt.so.1.53.0( 0xd1da)[0x7fd686fb81da]
[Sat Oct  9 19:07:29 2021]/lib64/libpthread.so.0( 0x7ea5)[0x7fd685509ea5]
[Sat Oct  9 19:07:29 2021]/lib64/libc.so.6(clone 0x6d)[0x7fd6846069fd]
[Sat Oct  9 19:07:29 2021]Caught signal 11 (SIGSEGV)
[Sat Oct  9 19:07:29 2021]stack trace:
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(OdinData::print_stack_trace(_IO_FILE*, unsigned int) 0xca)[0x499946]
[Sat Oct  9 19:07:29 2021]/dls_sw/prod/tools/RHEL7-x86_64/odin-data/1-6-0dls3/prefix/bin/frameProcessor(OdinData::abort_handler(int, siginfo_t*, void*) 0xc6)[0x499a74]
[Sat Oct  9 19:07:29 2021]/lib64/libpthread.so.0( 0xf630)[0x7fd685511630]
[Sat Oct  9 19:07:29 2021][0x1b770c0]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant