Simplify and improve AtomicQuadTree #5

gonzalobg · 2024-06-20T10:16:48Z

This PR simplifies the owning AtomicQuadTree and its Const/Non-Const reference types into a single "reference" type that can be easily passed to parallel algorithms by value.

It also performs some initial optimizations and simplifications on AtomicQuadTree and its algorithms:

The atomic_calc_mass algorithm is optimized to process one tree node per thread (threads assigned to empty nodes just exit). The last thread to process a node advances to the next one. This allow us to remove the leaf_count, which simplifies the quadtree, atomic_calc_mass, and atomic_insert. This improves performance by 1.9x on GPUs.

src/counting_iterator.h

src/kernels.h

src/atomic_quad_tree.h

src/all_pairs.h

limefax · 2024-06-21T16:08:20Z

src/kernels.h

    tree.next_nodes[0] = is_leaf<Index_t>;
    tree.node_status[0].store(NodeStatus::EmptyLeaf, memory_order_relaxed);
 }

 template<typename T, typename Index_t>
-auto clear_tree(System<T>& system, AtomicQuadTreeContainer<T, Index_t> tree) {
+auto clear_tree(System<T>& system, AtomicQuadTree<T, Index_t> tree) {
    // clear the tree, ready for next iteration
    auto r = system.body_indices();
    std::for_each_n(
        std::execution::par_unseq,
        r.begin(), tree.bump_allocator->load(memory_order_acquire),


The tree is not properly initialised in the first iteration and the first force step does not calculate acceleration properly. Applying the for loop to the entire tree I think is fine.

Suggested change

r.begin(), tree.bump_allocator->load(memory_order_acquire),

r.begin(), tree.capacity,

Furthermore, I am concerned that this can run over the end of the iterator? I'm not sure how this works with std::views::iota, but e.g. there could be 1000 bodies, so r counts up to 1000, but there may be 4000 tree nodes.

Regarding the std::views::iota question, std::views::iota(v) (with one argument, not two) generates an integer range that starts at the value and spans until numeric_limits<T>::max(). There are no values of type T greater than numeric_limits<T>::max(), so the number of leafs can't be larger (it'd have already overflown the integer type somewhere else).

The tree is not properly initialised in the first iteration and the first force step does not calculate acceleration properly.

I've tried to fix it in this commit (505dcbc) but I'm not 100% sure I did that correctly.

Also, i'm not sure if it is better to always clean the tree up to the capacity, or up to the last allocated node, but we can explore tweaking that in a sub-sequent PR so I've left things "as is".

Start one thread per tree node, and filter out threads that do not start at a leaf node, instead of preparing and updating a leaf node count.

gonzalobg · 2024-06-24T14:42:47Z

Rebased on top of main (no other changes).

limefax reviewed Jun 21, 2024

View reviewed changes

gonzalobg force-pushed the atom_quad_tree branch from 5bd7e65 to 27a83a3 Compare June 23, 2024 18:27

This was referenced Jun 23, 2024

Provide a collapsed implementation of all pairs #3

Merged

Update atomic_ref to use CTAD when clang 19 releases #6

Open

gonzalobg added 4 commits June 24, 2024 07:42

Simplify and Cleanup AtomicQuadTree

cc5bbb5

Remove leaf_count

99e6a71

Start one thread per tree node, and filter out threads that do not start at a leaf node, instead of preparing and updating a leaf node count.

Fix memory reclamation

08a97f8

Clean all tree nodes in first iteration

8970037

gonzalobg force-pushed the atom_quad_tree branch from 505dcbc to 8970037 Compare June 24, 2024 14:42

limefax merged commit a4c07f7 into UoB-HPC:main Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify and improve AtomicQuadTree #5

Simplify and improve AtomicQuadTree #5

gonzalobg commented Jun 20, 2024

limefax Jun 21, 2024

gonzalobg Jun 23, 2024

gonzalobg Jun 23, 2024

gonzalobg commented Jun 24, 2024

	r.begin(), tree.bump_allocator->load(memory_order_acquire),
	r.begin(), tree.capacity,

Simplify and improve AtomicQuadTree #5

Simplify and improve AtomicQuadTree #5

Conversation

gonzalobg commented Jun 20, 2024

limefax Jun 21, 2024

Choose a reason for hiding this comment

gonzalobg Jun 23, 2024

Choose a reason for hiding this comment

gonzalobg Jun 23, 2024

Choose a reason for hiding this comment

gonzalobg commented Jun 24, 2024