Change scatter/gather to allow custom/partial iteration over particles #326

aliemen · 2024-12-02T12:10:50Z

Goal

Integrating energy binning for the field solver (needed in OPAL-X) requires the functionality to only scatter particles inside a specific bin. Since particles might not be sorted by their bin index in the bunch, there are two aspects to consider:

Tell scatter(...) to iterate over a custom Kokkos::range_policy(...) passed by the user.
Since Kokkos does not support arbitrary index iteration, add the possibility to use ippl::hash_type for defining the index.

To get the full field solution on a particle (combined from every individual bin solution), gather(...) needs to add the field value to the attribute, not overwrite it ("+=" instead of "=").

Proposed Changes - `scatter`

Modify ParticleAttrib::scatter to accept a custom range policy as well as an ippl::hash_type:

template <typename T, class... Properties>
template <typename Field, class PT, typename policy_type>
void ParticleAttrib<T, Properties...>::scatter(
    Field& f, const ParticleAttrib<Vector<PT, Field::dim>, Properties...>& pp,
+   policy_type iteration_policy, hash_type hash_array) const {
    ...
+   const bool useHashView = hash_array.extent(0) > 0;
    if (useHashView && (iteration_policy.end() > hash_array.extent(0))) {
        Inform m("scatter");
        m << "Hash array was passed to scatter, but size does not match iteration policy." << endl;
        ippl::Comm->abort();
    }
    Kokkos::parallel_for(
        "ParticleAttrib::scatter", iteration_policy,
        KOKKOS_CLASS_LAMBDA(const size_t idx) {
            // map index to possible hash_map
+           size_t mapped_idx = useHashView ? hash_array(idx) : idx;

            // find nearest grid point
            vector_type l                        = (pp(mapped_idx) - origin) * invdx + 0.5;
            Vector<int, Field::dim> index        = l;
            Vector<PositionType, Field::dim> whi = l - index;
            Vector<PositionType, Field::dim> wlo = 1.0 - whi;

            Vector<size_t, Field::dim> args = index - lDom.first() + nghost;

            // scatter
+           const value_type& val = dview_m(mapped_idx);
            detail::scatterToField(std::make_index_sequence<1 << Field::dim>{}, view, wlo, whi,
                                    args, val);
        });
    ...
}

It uses the hash_type if a non-empty hash_array is passed and does a sanity check if the policy matches the amount of available elements. The main change is just the usage of mapped_idx instead of idx. This does not create andy branching on GPU, since useHashView is constant.

Non-Class `scatter` Interface

In order to preserve old functionality, I overloaded the arguments in a second implementation:

template <typename Attrib1, typename Field, typename Attrib2, 
+         typename policy_type = Kokkos::RangePolicy<typename Field::execution_space>>
inline void scatter(const Attrib1& attrib, Field& f, const Attrib2& pp) {
    attrib.scatter(f, pp, policy_type(0, attrib.getParticleCount())); 
}

// Second implementation for custom range policy, but without index array
template <typename Attrib1, typename Field, typename Attrib2, 
+         typename policy_type = Kokkos::RangePolicy<typename Field::execution_space>>
inline void scatter(const Attrib1& attrib, Field& f, const Attrib2& pp, 
+                   policy_type iteration_policy, typename Attrib1::hash_type hash_array = {}) {
    attrib.scatter(f, pp, iteration_policy, hash_array);
}

It is execution space independent (same as the field) and uses the hash_type already defined inside the attribute.

Proposed Changes - `gather`

Modify ParticleAttrib::gather to take a bool that overwrites the assignment operator:

template <typename T, class... Properties>
template <typename Field, typename P2>
void ParticleAttrib<T, Properties...>::gather(
    ...
    Kokkos::parallel_for(
        "ParticleAttrib::gather", policy_type(0, *(this->localNum_mp)),
        KOKKOS_CLASS_LAMBDA(const size_t idx) {
            ...

            // gather
            value_type gathered = detail::gatherFromField(std::make_index_sequence<1 << Field::dim>{},
                                                            view, wlo, whi, args);
+           if (addToAttribute) dview_m(idx) += gathered;
+           else                        dview_m(idx)   = gathered;
        });
    ...
}

It saves the gathered value and either overwrites or adds to the values of dview_m(idx). A custom iteration policy in here is not necessary, since a field-per-bin still affects all particles.

Non-Class `gather` Interface

As before, to preserve old functionality, a default argument is used in the Non-Class gather:

template <typename Attrib1, typename Field, typename Attrib2>
inline void gather(Attrib1& attrib, Field& f, const Attrib2& pp, 
+                           const bool addToAttribute = false) {
    attrib.gather(f, pp, addToAttribute);
}

Testing

I added another test file TestHashedScatter and modified the existing TestGather.

The text was updated successfully, but these errors were encountered:

aaadelmann · 2024-12-04T15:25:49Z

Before this goes into the Mast I want to see that all Alpine test are running correct, multirank on CPU & GPU!

aliemen · 2024-12-04T21:35:38Z

Update: I fixed the segmentation fault in the new "TestHashedScatter.cpp" test case. It now works for openmp and CUDA, both single and multi rank (tested for 1, 2, 4 ranks).

@aaadelmann Yes absolutely! So far everything I tested in LandauDamping worked without a problem. I will test the rest of Alpine tomorrow morning and then do the pull request.

aliemen · 2024-12-05T08:12:31Z

I ran LandauDamping, PenningTrap and BumponTailInstability each on CPU/GPU with each 1, 2 and 4 ranks and got no numerical differences in the result (I compared the .out files). The timings are also not affected:

Test Case	Backend	Ranks	New Implementation (s)	Old Implementation (s)
LandauDamping	GPU	1	0.4614	0.1353
		2	0.1990	0.1525
		4	0.2976	0.2324
	CPU	1	0.3410	0.3707
		2	0.3247	0.4160
		4	13.7412	22.7329
PenningTrap	GPU	1	0.2969	0.3137
		2	0.8176	0.8224
		4	2.8852	2.8372
	CPU	1	4.9564	6.3673
		2	4.7017	7.2059
		4	240.8940	240.8940
BumponTailInstability	GPU	1	0.0690	0.0671
		2	0.2186	0.2265
		4	0.2986	0.3326
	CPU	1	0.4648	0.4648
		2	0.4299	0.4299
		4	15.2093	15.2093

All differences are very likely just fluctuations in available ressources on merlin: in no test case is scatter enough to explain the time difference (nothing else changed in the code).

Numerical differences were only visible at machine accuracy for all 36 tests, e.g. 5.6545731599267274e+03 vs 5.6545731599267265e+03 (...72 vs ...65).

aliemen added enhancement New feature or request work in progress Used to label an issue you are working on labels Dec 2, 2024

aliemen removed the work in progress Used to label an issue you are working on label Dec 5, 2024

aliemen linked a pull request Dec 5, 2024 that will close this issue

Add Custom+Partial Iteration for Particle Scatter and Gather Operations #327

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change scatter/gather to allow custom/partial iteration over particles #326

Change scatter/gather to allow custom/partial iteration over particles #326

aliemen commented Dec 2, 2024 •

edited

Loading

aaadelmann commented Dec 4, 2024

aliemen commented Dec 4, 2024

aliemen commented Dec 5, 2024 •

edited

Loading

Change scatter/gather to allow custom/partial iteration over particles #326

Change scatter/gather to allow custom/partial iteration over particles #326

Comments

aliemen commented Dec 2, 2024 • edited Loading

Goal

Proposed Changes - scatter

Non-Class scatter Interface

Proposed Changes - gather

Non-Class gather Interface

Testing

aaadelmann commented Dec 4, 2024

aliemen commented Dec 4, 2024

aliemen commented Dec 5, 2024 • edited Loading

aliemen commented Dec 2, 2024 •

edited

Loading

Proposed Changes - `scatter`

Non-Class `scatter` Interface

Proposed Changes - `gather`

Non-Class `gather` Interface

aliemen commented Dec 5, 2024 •

edited

Loading