Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manager::Memory not thread safe #121

Open
michaelsippel opened this issue Oct 11, 2019 · 4 comments
Open

manager::Memory not thread safe #121

michaelsippel opened this issue Oct 11, 2019 · 4 comments

Comments

@michaelsippel
Copy link

michaelsippel commented Oct 11, 2019

Now noticed there is already an issue for this #12

I ran into a memory error with the PMacc-GoL in a multithreaded case with resource manager.

After an investigation i found that in cuplaMallocHost(), the variable buf is a nullptr, which comes from manager::Memory::alloc(), which makes a reference out of an pointer (read from a map).
The variable manager::Memory::m_mapVector is not protected with a mutex, so there is a race condition indeed when doing multiple cuplaMallocHost() concurrently, causing one insert to be overwritten.
Could be solved with a simple mutex.

In cuplaMallocHost():

(gdb) select-frame 13
(gdb) info locals
extent = {static s_uiDim = <optimized out>, m_data = {8}}
buf = <error reading variable>
(gdb) p &buf
$2 = (alpaka::mem::buf::BufCpu<unsigned char, std::integral_constant<unsigned long, 1>, unsigned long> *) 0x0

So here is the relevant backtrace from my case:

#10 0x00005555556746da in
std::__shared_ptr_access<
    alpaka::mem::buf::cpu::detail::BufCpuImpl<
        unsigned char,
        std::integral_constant<unsigned long, 1ul>,
        unsigned long
    >,
    (__gnu_cxx::_Lock_policy)2,
    false,
    false
>::operator-> (this=0x0)
at /usr/include/c++/9.1.0/bits/shared_ptr_base.h:1015

#11 0x0000555555672c5e in
alpaka::mem::view::traits::GetPtrNative<
    alpaka::mem::buf::BufCpu<
        unsigned char,
        std::integral_constant<unsigned long, 1ul>,
    unsigned long
    >,
    void
>::getPtrNative
at .../alpaka/include/alpaka/mem/buf/BufCpu.hpp:291

#12 0x0000555555671279 in
alpaka::mem::view::getPtrNative<
    alpaka::mem::buf::BufCpu<
        unsigned char,
    std::integral_constant<unsigned long, 1ul>,
    unsigned long
    >
>
at .../alpaka/include/alpaka/mem/view/Traits.hpp:202

#13 0x000055555566f0ba in cupla_omp2_seq_async::cuplaMallocHost (ptrptr=0x5555558c0ba0, size=8)
@sbastrakov
Copy link
Member

@michaelsippel thanks for reporting. Indeed, cupla API is currently not thread-safe and we need to some day fix it.

@sbastrakov
Copy link
Member

@psychocoderHPC @ax3l do you think we should add a note to the readme (or somewhere) that cupla API is not thread safe, as was correctly pointed out in this issue?

@ax3l
Copy link
Member

ax3l commented Feb 4, 2020

Yes, should be documented in the docs, including a few words what this implies/how to use it. (Just noting it might be too brief for non-computer scientists.)

@sbastrakov
Copy link
Member

@ax3l okay, I could provide a PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants