Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPX serialization error with std::vector<std::vector<std::vector<float>>> #6623

Open
JiakunYan opened this issue Feb 24, 2025 · 3 comments · May be fixed by #6626
Open

HPX serialization error with std::vector<std::vector<std::vector<float>>> #6623

JiakunYan opened this issue Feb 24, 2025 · 3 comments · May be fixed by #6626

Comments

@JiakunYan
Copy link
Contributor

JiakunYan commented Feb 24, 2025

Open the issue on behalf of @m-diers

If the data structure to serialize is std::vector<std::vector<std::vector<float>>>, HPX crashes as soon as two nodes are used (even with two processes on one node). This time with all the parcelports: LCI,MPI and TCP.

Here is the slightly adapted example:
https://gist.github.com/m-diers/78be383845516cc2d34e74a32632b672

@hkaiser It looks to me like a serialization issue rather than a parcelport one. Is the serialization code designed to handle multi-layer nested vectors?

@hkaiser
Copy link
Member

hkaiser commented Feb 24, 2025

@hkaiser It looks to me like a serialization issue rather than a parcelport one. Is the serialization code designed to handle multi-layer nested vectors?

I'll have a look. Thanks!

@m-diers
Copy link
Contributor

m-diers commented Feb 25, 2025

@hkaiser Some more information:
I test with the changes of #6619

  • std:vector<std::vector<float>>
    runs so far without problems up to the memory limits of the nodes.
  • std:vector<std:vector<std::vector<float>>
    runs with 812 x 812 x 812.
  • terminate called after throwing an instance of 'std::length_error' what(): vector::_M_default_append
    starting from 813 x 812 x 812

@hkaiser
Copy link
Member

hkaiser commented Feb 26, 2025

@JiakunYan a first test shows that things work properly (for me) when using the TCP parcelport. On the MPI parcelport, I see this assertion:


{config}:
Core library:
  HPX_AGAS_LOCAL_CACHE_SIZE=4096
  HPX_HAVE_MALLOC=jemalloc
  HPX_PARCEL_MAX_CONNECTIONS=512
  HPX_PARCEL_MAX_CONNECTIONS_PER_LOCALITY=4
  HPX_PREFIX (configured)=D:/Devel/hpx/build/All.v16.vcpkg
  HPX_PREFIX=D:\Devel\hpx\build\All.v16.vcpkg\Debug

  HPX_FILESYSTEM_WITH_BOOST_FILESYSTEM_COMPATIBILITY=OFF
  HPX_ITERATOR_SUPPORT_WITH_BOOST_ITERATOR_TRAVERSAL_TAG_COMPATIBILITY=OFF
  HPX_WITH_AGAS_DUMP_REFCNT_ENTRIES=OFF
  HPX_WITH_APEX=OFF
  HPX_WITH_ASYNC_MPI=OFF
  HPX_WITH_ATTACH_DEBUGGER_ON_TEST_FAILURE=OFF
  HPX_WITH_AUTOMATIC_SERIALIZATION_REGISTRATION=ON
  HPX_WITH_COMPRESSION_BZIP2=ON
  HPX_WITH_COMPRESSION_SNAPPY=ON
  HPX_WITH_COMPRESSION_ZLIB=ON
  HPX_WITH_COROUTINE_COUNTERS=OFF
  HPX_WITH_DISTRIBUTED_RUNTIME=ON
  HPX_WITH_IO_POOL=ON
  HPX_WITH_ITTNOTIFY=OFF
  HPX_WITH_LOGGING=ON
  HPX_WITH_NETWORKING=ON
  HPX_WITH_PAPI=OFF
  HPX_WITH_PARALLEL_TESTS_BIND_NONE=OFF
  HPX_WITH_PARCELPORT_ACTION_COUNTERS=ON
  HPX_WITH_PARCELPORT_LCI=OFF
  HPX_WITH_PARCELPORT_LIBFABRIC=OFF
  HPX_WITH_PARCELPORT_MPI=ON
  HPX_WITH_PARCELPORT_MPI_MULTITHREADED=ON
  HPX_WITH_PARCELPORT_TCP=ON
  HPX_WITH_PARCEL_COALESCING=ON
  HPX_WITH_PARCEL_PROFILING=OFF
  HPX_WITH_SANITIZERS=OFF
  HPX_WITH_SCHEDULER_LOCAL_STORAGE=OFF
  HPX_WITH_SPINLOCK_DEADLOCK_DETECTION=OFF
  HPX_WITH_STACKTRACES=ON
  HPX_WITH_TESTS_DEBUG_LOG=OFF
  HPX_WITH_THREAD_BACKTRACE_ON_SUSPENSION=OFF
  HPX_WITH_THREAD_CREATION_AND_CLEANUP_RATES=OFF
  HPX_WITH_THREAD_CUMULATIVE_COUNTS=ON
  HPX_WITH_THREAD_DEBUG_INFO=OFF
  HPX_WITH_THREAD_DESCRIPTION_FULL=OFF
  HPX_WITH_THREAD_GUARD_PAGE=OFF
  HPX_WITH_THREAD_IDLE_RATES=OFF
  HPX_WITH_THREAD_LOCAL_STORAGE=OFF
  HPX_WITH_THREAD_MANAGER_IDLE_BACKOFF=ON
  HPX_WITH_THREAD_QUEUE_WAITTIME=OFF
  HPX_WITH_THREAD_STACK_MMAP=OFF
  HPX_WITH_THREAD_STEALING_COUNTS=OFF
  HPX_WITH_THREAD_TARGET_ADDRESS=OFF
  HPX_WITH_TIMER_POOL=ON
  HPX_WITH_TUPLE_RVALUE_SWAP=ON
  HPX_WITH_VALGRIND=OFF
  HPX_WITH_VERIFY_LOCKS=ON
  HPX_WITH_VERIFY_LOCKS_BACKTRACE=OFF

Module allocator_support:
  HPX_ALLOCATOR_SUPPORT_WITH_CACHING=ON

Module command_line_handling_local:
  HPX_COMMAND_LINE_HANDLING_LOCAL_WITH_JSON_CONFIGURATION_FILES=OFF

Module coroutines:
  HPX_COROUTINES_WITH_SWAP_CONTEXT_EMULATION=ON
  HPX_COROUTINES_WITH_THREAD_SCHEDULE_HINT_RUNS_AS_CHILD=ON

Module datastructures:
  HPX_DATASTRUCTURES_WITH_ADAPT_STD_TUPLE=OFF
  HPX_DATASTRUCTURES_WITH_ADAPT_STD_VARIANT=OFF

Module logging:
  HPX_LOGGING_WITH_SEPARATE_DESTINATIONS=ON

Module serialization:
  HPX_SERIALIZATION_WITH_ALLOW_CONST_TUPLE_MEMBERS=OFF
  HPX_SERIALIZATION_WITH_ALLOW_RAW_POINTER_SERIALIZATION=OFF
  HPX_SERIALIZATION_WITH_ALL_TYPES_ARE_BITWISE_SERIALIZABLE=OFF
  HPX_SERIALIZATION_WITH_BOOST_TYPES=ON
  HPX_SERIALIZATION_WITH_SUPPORTS_ENDIANESS=OFF

Module topology:
  HPX_TOPOLOGY_WITH_ADDITIONAL_HWLOC_TESTING=OFF

{version}: V1.11.0-trunk (AGAS: V3.0), Git: e8cde1b8f9
{boost}: V1.85.0
{build-type}: debug
{date}: Feb 24 2025 10:45:52
{platform}: Win32
{compiler}: Microsoft Visual C++ version 1944
{stdlib}: Dinkumware standard library version 650
{stack-trace}: 20 frames:
00007FFA73DE734E: hpx::util::trace_on_new_stack +0x7e
00007FFA73DDD3F4: hpx::detail::custom_exception_info +0xf4
00007FFA74FCBF13: std::invoke<hpx::exception_info (__cdecl*&)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,long,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,long,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &> +0x53
00007FFA74FD9025: std::_Func_impl_no_alloc<hpx::exception_info (__cdecl*)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,long,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &),hpx::exception_info,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,long,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &>::_Do_call +0x55
00007FFA73CE61BA: std::_Func_class<hpx::exception_info,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,long,std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &>::operator() +0x8a
00007FFA73CDA554: hpx::detail::construct_custom_exception<hpx::exception> +0xd4
00007FFA73CDFE20: hpx::detail::get_exception<hpx::exception> +0x90
00007FFA73DEA7E7: hpx::detail::assertion_handler +0x477
00007FFA73BF9E49: hpx::assertion::detail::handle_assert +0xd9
00007FFA75018179: hpx::parcelset::decode_chunks_zero_copy<hpx::parcelset::parcel_buffer<std::vector<char,std::allocator<char> >,hpx::serialization::serialization_chunk> > +0x919
00007FFA75018F72: hpx::parcelset::decode_message_zero_copy<hpx::parcelset::policies::mpi::parcelport,hpx::parcelset::parcel_buffer<std::vector<char,std::allocator<char> >,hpx::serialization::serialization_chunk> > +0x52
00007FFA75019172: hpx::parcelset::decode_parcels_zero_copy<hpx::parcelset::policies::mpi::parcelport,hpx::parcelset::parcel_buffer<std::vector<char,std::allocator<char> >,hpx::serialization::serialization_chunk> > +0x42
00007FFA7504647E: hpx::parcelset::policies::mpi::receiver_connection<hpx::parcelset::policies::mpi::parcelport>::receive_chunks +0x5ee
00007FFA750343C9: hpx::parcelset::policies::mpi::receiver_connection<hpx::parcelset::policies::mpi::parcelport>::ack_data +0x59
00007FFA75045CA5: hpx::parcelset::policies::mpi::receiver_connection<hpx::parcelset::policies::mpi::parcelport>::receive +0xc5
00007FFA75047895: hpx::parcelset::policies::mpi::receiver<hpx::parcelset::policies::mpi::parcelport>::receive_messages +0x45
00007FFA7503670D: hpx::parcelset::policies::mpi::receiver<hpx::parcelset::policies::mpi::parcelport>::background_work +0x11d
00007FFA75036869: hpx::parcelset::policies::mpi::parcelport::background_work +0xa9
00007FFA7503AD38: hpx::parcelset::parcelport_impl<hpx::parcelset::policies::mpi::parcelport>::do_background_work_impl +0x38
00007FFA7503ACE3: hpx::parcelset::parcelport_impl<hpx::parcelset::policies::mpi::parcelport>::do_background_work +0x33
{locality-id}: 1
{hostname}: [ (mpi:1) (tcp:127.0.0.1:7911) ]
{process-id}: 23600
{os-thread}: 0, locality#1/worker-thread#0
{thread-id}: 00000000f73e0c00
{thread-description}: background_work
{state}: state::running
{auxinfo}:
{file}: D:\Devel\hpx\libs\full\parcelset\include\hpx\parcelset\decode_parcels.hpp
{line}: 204
{function}: class std::vector<struct hpx::serialization::serialization_chunk,class std::allocator<struct hpx::serialization::serialization_chunk> > __cdecl hpx::parcelset::decode_chunks_zero_copy<struct hpx::parcelset::parcel_buffer<class std::vector<char,class std::allocator<char> >,struct hpx::serialization::serialization_chunk>>(struct hpx::parcelset::parcel_buffer<class std::vector<char,class std::allocator<char> >,struct hpx::serialization::serialization_chunk> &)
{what}: Assertion 'chunks[i].size_ != 0' failed: HPX(assertion_failure)

Does this ring any bell?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants