Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change scatter/gather to allow custom/partial iteration over particles #326

Open
aliemen opened this issue Dec 2, 2024 · 3 comments · May be fixed by #327
Open

Change scatter/gather to allow custom/partial iteration over particles #326

aliemen opened this issue Dec 2, 2024 · 3 comments · May be fixed by #327
Labels
enhancement New feature or request

Comments

@aliemen
Copy link
Collaborator

aliemen commented Dec 2, 2024

Goal

Integrating energy binning for the field solver (needed in OPAL-X) requires the functionality to only scatter particles inside a specific bin. Since particles might not be sorted by their bin index in the bunch, there are two aspects to consider:

  1. Tell scatter(...) to iterate over a custom Kokkos::range_policy(...) passed by the user.
  2. Since Kokkos does not support arbitrary index iteration, add the possibility to use ippl::hash_type for defining the index.

To get the full field solution on a particle (combined from every individual bin solution), gather(...) needs to add the field value to the attribute, not overwrite it ("+=" instead of "=").

Proposed Changes - scatter

Modify ParticleAttrib::scatter to accept a custom range policy as well as an ippl::hash_type:

template <typename T, class... Properties>
template <typename Field, class PT, typename policy_type>
void ParticleAttrib<T, Properties...>::scatter(
    Field& f, const ParticleAttrib<Vector<PT, Field::dim>, Properties...>& pp,
+   policy_type iteration_policy, hash_type hash_array) const {
    ...
+   const bool useHashView = hash_array.extent(0) > 0;
    if (useHashView && (iteration_policy.end() > hash_array.extent(0))) {
        Inform m("scatter");
        m << "Hash array was passed to scatter, but size does not match iteration policy." << endl;
        ippl::Comm->abort();
    }
    Kokkos::parallel_for(
        "ParticleAttrib::scatter", iteration_policy,
        KOKKOS_CLASS_LAMBDA(const size_t idx) {
            // map index to possible hash_map
+           size_t mapped_idx = useHashView ? hash_array(idx) : idx;

            // find nearest grid point
            vector_type l                        = (pp(mapped_idx) - origin) * invdx + 0.5;
            Vector<int, Field::dim> index        = l;
            Vector<PositionType, Field::dim> whi = l - index;
            Vector<PositionType, Field::dim> wlo = 1.0 - whi;

            Vector<size_t, Field::dim> args = index - lDom.first() + nghost;

            // scatter
+           const value_type& val = dview_m(mapped_idx);
            detail::scatterToField(std::make_index_sequence<1 << Field::dim>{}, view, wlo, whi,
                                    args, val);
        });
    ...
}

It uses the hash_type if a non-empty hash_array is passed and does a sanity check if the policy matches the amount of available elements. The main change is just the usage of mapped_idx instead of idx. This does not create andy branching on GPU, since useHashView is constant.

Non-Class scatter Interface

In order to preserve old functionality, I overloaded the arguments in a second implementation:

template <typename Attrib1, typename Field, typename Attrib2, 
+         typename policy_type = Kokkos::RangePolicy<typename Field::execution_space>>
inline void scatter(const Attrib1& attrib, Field& f, const Attrib2& pp) {
    attrib.scatter(f, pp, policy_type(0, attrib.getParticleCount())); 
}

// Second implementation for custom range policy, but without index array
template <typename Attrib1, typename Field, typename Attrib2, 
+         typename policy_type = Kokkos::RangePolicy<typename Field::execution_space>>
inline void scatter(const Attrib1& attrib, Field& f, const Attrib2& pp, 
+                   policy_type iteration_policy, typename Attrib1::hash_type hash_array = {}) {
    attrib.scatter(f, pp, iteration_policy, hash_array);
}

It is execution space independent (same as the field) and uses the hash_type already defined inside the attribute.

Proposed Changes - gather

Modify ParticleAttrib::gather to take a bool that overwrites the assignment operator:

template <typename T, class... Properties>
template <typename Field, typename P2>
void ParticleAttrib<T, Properties...>::gather(
    ...
    Kokkos::parallel_for(
        "ParticleAttrib::gather", policy_type(0, *(this->localNum_mp)),
        KOKKOS_CLASS_LAMBDA(const size_t idx) {
            ...

            // gather
            value_type gathered = detail::gatherFromField(std::make_index_sequence<1 << Field::dim>{},
                                                            view, wlo, whi, args);
+           if (addToAttribute) dview_m(idx) += gathered;
+           else                        dview_m(idx)   = gathered;
        });
    ...
}

It saves the gathered value and either overwrites or adds to the values of dview_m(idx). A custom iteration policy in here is not necessary, since a field-per-bin still affects all particles.

Non-Class gather Interface

As before, to preserve old functionality, a default argument is used in the Non-Class gather:

template <typename Attrib1, typename Field, typename Attrib2>
inline void gather(Attrib1& attrib, Field& f, const Attrib2& pp, 
+                           const bool addToAttribute = false) {
    attrib.gather(f, pp, addToAttribute);
}

Testing

I added another test file TestHashedScatter and modified the existing TestGather.

@aliemen aliemen added enhancement New feature or request work in progress Used to label an issue you are working on labels Dec 2, 2024
@aaadelmann
Copy link
Member

Before this goes into the Mast I want to see that all Alpine test are running correct, multirank on CPU & GPU!

@aliemen
Copy link
Collaborator Author

aliemen commented Dec 4, 2024

Update: I fixed the segmentation fault in the new "TestHashedScatter.cpp" test case. It now works for openmp and CUDA, both single and multi rank (tested for 1, 2, 4 ranks).

@aaadelmann Yes absolutely! So far everything I tested in LandauDamping worked without a problem. I will test the rest of Alpine tomorrow morning and then do the pull request.

@aliemen
Copy link
Collaborator Author

aliemen commented Dec 5, 2024

I ran LandauDamping, PenningTrap and BumponTailInstability each on CPU/GPU with each 1, 2 and 4 ranks and got no numerical differences in the result (I compared the .out files). The timings are also not affected:

Test Case Backend Ranks New Implementation (s) Old Implementation (s)
LandauDamping GPU 1 0.4614 0.1353
2 0.1990 0.1525
4 0.2976 0.2324
CPU 1 0.3410 0.3707
2 0.3247 0.4160
4 13.7412 22.7329
PenningTrap GPU 1 0.2969 0.3137
2 0.8176 0.8224
4 2.8852 2.8372
CPU 1 4.9564 6.3673
2 4.7017 7.2059
4 240.8940 240.8940
BumponTailInstability GPU 1 0.0690 0.0671
2 0.2186 0.2265
4 0.2986 0.3326
CPU 1 0.4648 0.4648
2 0.4299 0.4299
4 15.2093 15.2093

All differences are very likely just fluctuations in available ressources on merlin: in no test case is scatter enough to explain the time difference (nothing else changed in the code).

Numerical differences were only visible at machine accuracy for all 36 tests, e.g. 5.6545731599267274e+03 vs 5.6545731599267265e+03 (...72 vs ...65).

@aliemen aliemen removed the work in progress Used to label an issue you are working on label Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants