-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory profiling causes rocmIsEnabled to segfault #47450
Comments
assign core,heterogeneous |
New categories assigned: core,heterogeneous @Dr15Jones,@fwyzard,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
cms-bot internal usage |
A new Issue was created by @iarspider. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
This was discovered when preparing cms-sw/cms-bot#2418. |
@iarspider what was the command you issued to start gdb? I do not understand what you meant when you wrote "if --maxmem_profile is passed to cmsRun" as |
It seems to be the argument for |
So I looked at the output of one of the failing RelVals in the PR in question. The log contains
So
so that is the origin of the call to the stand alone binary |
We have seen this behavior also before #45964 (comment) |
So I ran
a dozen or so times at FNAL using a What is needed to get a consistent (or at least a probable) crash? |
@Dr15Jones I got this crash on a node (LUMI) with ROCm-enabled GPU using CMSSW_15_1_X_2025-02-23-0000 (but I think only the first part is important). |
It seems to be technically possible to avoid passing the |
Of course avoiding the crash in |
I vaguely recall the MaxMemoryPreload was supposed to be run only on select IB flavors (I can't find the discussion though, I did find the cms-bot PR adding the use of |
@gartung mentioned he experienced crash in |
@gartung this actually works for me on a bare metal node, with a Radeon Pro W7800: $ cd /data/cmssw/el8_amd64_gcc12/cms/cmssw/CMSSW_15_0_0_pre3
$ cmsenv
$ LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so rocmIsEnabled
Memory Report: total memory requested: 231066386
Memory Report: max memory used: 14799096
Memory Report: presently used: 8
Memory Report: # allocations calls: 732952
Memory Report: # deallocations calls: 741971
$ LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so rocmComputeCapabilities
0 gfx1100 AMD Radeon PRO W7800
Memory Report: total memory requested: 231094400
Memory Report: max memory used: 14799304
Memory Report: presently used: 8
Memory Report: # allocations calls: 733031
Memory Report: # deallocations calls: 743723 Update on this node it also works within an Alma 8 or Alma 9 container: Singularity> rocmIsEnabled
Singularity> LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so rocmComputeCapabilities
0 gfx1100 AMD Radeon Graphics
Memory Report: total memory requested: 231103916
Memory Report: max memory used: 14799200
Memory Report: presently used: 8
Memory Report: # allocations calls: 733223
Memory Report: # deallocations calls: 743916 |
I confirm that it does fail on LUMI, in an Alma 8 container: $ rocmComputeCapabilities
0 gfx90a:sramecc+:xnack- AMD Instinct MI250X
$ LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so rocmComputeCapabilities
Segmentation fault |
I was using |
@fwyzard |
FWIW I opened a draft PR to do that #47452 |
ehr... with a debug build (I used CMSSW_15_1_DBG_X_2025-02-19-2300) the LD_PRELOAD command does not crash: andbocci@nid007977:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_DBG_X_2025-02-19-2300$ cmsenv
andbocci@nid007977:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_DBG_X_2025-02-19-2300$ LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so rocmIsEnabled
Memory Report: total memory requested: 248269565
Memory Report: max memory used: 14760840
Memory Report: presently used: 16
Memory Report: # allocations calls: 860831
Memory Report: # deallocations calls: 869850
andbocci@nid007977:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_DBG_X_2025-02-19-2300$ echo $?
0 |
Actually it also works with the same non-DEBUG IB: andbocci@nid007977:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-19-2300$ cmsenv
andbocci@nid007977:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-19-2300$ LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so rocmIsEnabled
Memory Report: total memory requested: 248052703
Memory Report: max memory used: 14760776
Memory Report: presently used: 8
Memory Report: # allocations calls: 860799
Memory Report: # deallocations calls: 869818
andbocci@nid007977:/cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-19-2300$ echo $?
0 |
It actually works on all releases earlier than |
TBB (version v2022.0.0) was updated for |
Mhm 🤔 |
Anyway, here is the GDB stack trace from $ gdb -ex 'set pagination off' -ex 'set environment LD_PRELOAD libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so' -ex r -ex bt rocmIsEnabled
...
Thread 1 "rocmIsEnabled" received signal SIGSEGV, Segmentation fault.
0x00001555538a8a4c in _int_free () from /lib64/libc.so.6
#0 0x00001555538a8a4c in _int_free () from /lib64/libc.so.6
#1 0x00001555555441ac in operator delete(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/lib/el8_amd64_gcc12/libPerfToolsAllocMonitorPreload.so
#2 0x000015554f8dd24d in llvm::GenericCycleInfoCompute<llvm::GenericSSAContext<llvm::Function> >::updateDepth(llvm::GenericCycle<llvm::GenericSSAContext<llvm::Function> >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#3 0x000015554f8ddd93 in llvm::GenericCycleInfoCompute<llvm::GenericSSAContext<llvm::Function> >::run(llvm::BasicBlock*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#4 0x000015554f8dfc22 in llvm::CycleInfoWrapperPass::runOnFunction(llvm::Function&) [clone .localalias.5] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#5 0x0000155550035c69 in llvm::FPPassManager::runOnFunction(llvm::Function&) [clone .localalias.4] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#6 0x0000155550035db1 in llvm::FPPassManager::runOnModule(llvm::Module&) [clone .localalias.54] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#7 0x0000155550036a7f in llvm::legacy::PassManagerImpl::run(llvm::Module&) [clone .localalias.36] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#8 0x000015554c0695ac in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >, clang::BackendConsumer*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#9 0x000015554c0454a1 in clang::CodeGenAction::ExecuteAction() [clone .localalias.40] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#10 0x000015554db23851 in clang::FrontendAction::Execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#11 0x000015554daaf2fa in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) [clone .localalias.2] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#12 0x000015554bb9e673 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#13 0x000015554af7fd6b in COMGR::AMDGPUCompiler::executeInProcessDriver(llvm::ArrayRef<char const*>) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#14 0x000015554af81fdc in COMGR::AMDGPUCompiler::processFile(char const*, char const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#15 0x000015554af82618 in COMGR::AMDGPUCompiler::processFiles(amd_comgr_data_kind_s, char const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#16 0x000015554af9356d in amd_comgr_do_action () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#17 0x0000155553e8c205 in amd::device::Program::compileAndLinkExecutable(amd_comgr_data_set_s, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, amd::option::Options*, char**, unsigned long*, amd::device::Program::file_type_t) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#18 0x0000155553e8ebd4 in amd::device::Program::linkImplLC(amd::option::Options*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#19 0x0000155553e8b141 in amd::device::Program::build(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, amd::option::Options*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#20 0x0000155553eb4b26 in amd::Program::build(std::vector<amd::Device*, std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#21 0x0000155553e85ded in amd::Device::BlitProgram::create(amd::Device*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#22 0x0000155553ec2edb in amd::roc::Device::createBlitProgram() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#23 0x0000155553f06aa8 in amd::roc::KernelBlitManager::createProgram(amd::roc::Device&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#24 0x0000155553edb4cd in amd::roc::VirtualGPU::create() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#25 0x0000155553ebce08 in amd::roc::Device::createVirtualDevice(amd::CommandQueue*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#26 0x0000155553ea9f74 in amd::HostQueue::HostQueue(amd::Context&, amd::Device&, unsigned long, unsigned int, amd::CommandQueue::Priority, std::vector<unsigned int, std::allocator<unsigned int> > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#27 0x0000155553e05e19 in hip::Stream::Stream(hip::Device*, hip::Stream::Priority, unsigned int, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, hipStreamCaptureStatus) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#28 0x0000155553ca1c94 in hip::Device::NullStream(bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#29 0x0000155553d4dba7 in hip::ihipMemset(void*, long, unsigned long, unsigned long, ihipStream_t*, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#30 0x0000155553d7ed2c in hip::hipMemset(void*, int, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#31 0x00000000004012fd in isRocmDeviceSupported(int) ()
#32 0x0000000000401190 in main () |
playing with breakpoints, I can get a similar stack trace with #0 0x00001555538a82a0 in _int_free () from /lib64/libc.so.6
#1 0x00001555555441ac in operator delete(void*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/lib/el8_amd64_gcc12/libPerfToolsAllocMonitorPreload.so
#2 0x000015554f8dd1fb in llvm::GenericCycleInfoCompute<llvm::GenericSSAContext<llvm::Function> >::updateDepth(llvm::GenericCycle<llvm::GenericSSAContext<llvm::Function> >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#3 0x000015554f8ddd93 in llvm::GenericCycleInfoCompute<llvm::GenericSSAContext<llvm::Function> >::run(llvm::BasicBlock*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#4 0x000015554f8dfc22 in llvm::CycleInfoWrapperPass::runOnFunction(llvm::Function&) [clone .localalias.5] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#5 0x0000155550035c69 in llvm::FPPassManager::runOnFunction(llvm::Function&) [clone .localalias.4] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#6 0x0000155550035db1 in llvm::FPPassManager::runOnModule(llvm::Module&) [clone .localalias.54] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#7 0x0000155550036a7f in llvm::legacy::PassManagerImpl::run(llvm::Module&) [clone .localalias.36] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#8 0x000015554c0695ac in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >, clang::BackendConsumer*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#9 0x000015554c0454a1 in clang::CodeGenAction::ExecuteAction() [clone .localalias.40] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#10 0x000015554db23851 in clang::FrontendAction::Execute() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#11 0x000015554daaf2fa in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) [clone .localalias.2] () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#12 0x000015554bb9e673 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#13 0x000015554af7fd6b in COMGR::AMDGPUCompiler::executeInProcessDriver(llvm::ArrayRef<char const*>) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#14 0x000015554af81fdc in COMGR::AMDGPUCompiler::processFile(char const*, char const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#15 0x000015554af82618 in COMGR::AMDGPUCompiler::processFiles(amd_comgr_data_kind_s, char const*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#16 0x000015554af9356d in amd_comgr_do_action () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
#17 0x0000155553e8c205 in amd::device::Program::compileAndLinkExecutable(amd_comgr_data_set_s, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, amd::option::Options*, char**, unsigned long*, amd::device::Program::file_type_t) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#18 0x0000155553e8ebd4 in amd::device::Program::linkImplLC(amd::option::Options*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#19 0x0000155553e8b141 in amd::device::Program::build(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, amd::option::Options*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#20 0x0000155553eb4b26 in amd::Program::build(std::vector<amd::Device*, std::allocator<amd::Device*> > const&, char const*, void (*)(_cl_program*, void*), void*, bool, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#21 0x0000155553e85ded in amd::Device::BlitProgram::create(amd::Device*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#22 0x0000155553ec2edb in amd::roc::Device::createBlitProgram() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#23 0x0000155553f06aa8 in amd::roc::KernelBlitManager::createProgram(amd::roc::Device&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#24 0x0000155553edb4cd in amd::roc::VirtualGPU::create() () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#25 0x0000155553ebce08 in amd::roc::Device::createVirtualDevice(amd::CommandQueue*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#26 0x0000155553ea9f74 in amd::HostQueue::HostQueue(amd::Context&, amd::Device&, unsigned long, unsigned int, amd::CommandQueue::Priority, std::vector<unsigned int, std::allocator<unsigned int> > const&) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#27 0x0000155553e05e19 in hip::Stream::Stream(hip::Device*, hip::Stream::Priority, unsigned int, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, hipStreamCaptureStatus) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#28 0x0000155553ca1c94 in hip::Device::NullStream(bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#29 0x0000155553d4dba7 in hip::ihipMemset(void*, long, unsigned long, unsigned long, ihipStream_t*, bool) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#30 0x0000155553d7ed2c in hip::hipMemset(void*, int, unsigned long) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02877/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-1100/external/el8_amd64_gcc12/lib/libamdhip64.so.6
#31 0x00000000004012fd in isRocmDeviceSupported(int) ()
#32 0x0000000000401190 in main () but then if I continue, it works: (gdb) disable
(gdb) c
Continuing.
[Thread 0x155443dff700 (LWP 5789) exited]
[Thread 0x1555493ff700 (LWP 5787) exited]
Memory Report: total memory requested: 248083365
Memory Report: max memory used: 14764880
Memory Report: presently used: 0
Memory Report: # allocations calls: 860865
Memory Report: # deallocations calls: 869884
[Inferior 1 (process 5786) exited normally] |
And here is the top (bottom ?) of the stack trace with CMSSW_15_1_X_2025-02-21-2300, after rebuilding the relevant packages with debug symbols: #0 0x00001555538a8a4c in _int_free () from /lib64/libc.so.6
#1 0x00001555555441ac in operator()<void*> (ptr=0xcf04c0, __closure=<synthetic pointer>) at src/PerfTools/AllocMonitorPreload/src/memory_proxies.cc:326
#2 cms::perftools::AllocMonitorRegistry::deallocCalled<operator delete(void*)::<lambda(auto:26)>, operator delete(void*)::<lambda(auto:27)> > (iDealloc=..., iGetActual=..., iPtr=0xcf04c0, this=0x1555555320c0 <cms::perftools::AllocMonitorRegistry::instance()::s_registry>) at /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/src/PerfTools/AllocMonitor/interface/AllocMonitorRegistry.h:133
#3 operator delete (ptr=0xcf04c0) at src/PerfTools/AllocMonitorPreload/src/memory_proxies.cc:326
#4 operator delete (ptr=0xcf04c0) at src/PerfTools/AllocMonitorPreload/src/memory_proxies.cc:318
#5 0x000015554f8dd24d in llvm::GenericCycleInfoCompute<llvm::GenericSSAContext<llvm::Function> >::updateDepth(llvm::GenericCycle<llvm::GenericSSAContext<llvm::Function> >*) () from /cvmfs/cms-ib.cern.ch/sw/x86_64/week1/el8_amd64_gcc12/cms/cmssw/CMSSW_15_1_X_2025-02-21-2300/external/el8_amd64_gcc12/lib/libamd_comgr.so.2
... |
Output of
LD_PRELOAD=libPerfToolsAllocMonitorPreload.so:libPerfToolsMaxMemoryPreload.so gdb --args rocmIsEnabled
(these libraries are preloaded if--maxmem_profile
is passed tocmsDriver
):The text was updated successfully, but these errors were encountered: