Skip to content

Commit

Permalink
Remove MPIData (#270)
Browse files Browse the repository at this point in the history
* Test segmented buffer

* implement a buffer for alltoall

* fix buffer communication

* implement startegy

* add reporting for communication strats

* add reporting for point to point

* fix bug with NotDistributed

* Change initialisation, add check to all MPI calls PDVec makes.

* MPI workaround

* Make AllToAll the default

* add dot for alltoall

* Fix broken tests

* Docstrings, make sure both strategies are tested

* Fix broken test, more parallelism

* remove print

* more parallelism

* Fix communicator switching in tests

* test that communicator is preserved

* more docstring updates

* some renames, docstring updates

* report wait time

* add show methods to communicators

* More links and explanations

* Small tweak in docstring

* remove MPIData

* remove RMPI, move helpers to toplevel

* scrub all mentions of RMPI

* add missing file

* update MPI.Allreduce overload

* Make sum work on Apple Silicon

* Test on mac

* only include macos for v1

* fix action config

* fix path for macos

---------

Co-authored-by: Joachim Brand <[email protected]>
  • Loading branch information
mtsch and joachimbrand authored Nov 25, 2024
1 parent 83f439f commit 0a137fc
Show file tree
Hide file tree
Showing 31 changed files with 130 additions and 1,463 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ jobs:
strategy:
matrix:
julia-version: ['1', 'nightly', '1.9']
julia-arch: [x64]
os: [ubuntu-latest]
include:
- julia-version: '1'
os: macos-latest
fail-fast: false
steps:
- name: "Checkout"
Expand All @@ -19,7 +21,6 @@ jobs:
uses: julia-actions/setup-julia@v2
with:
version: ${{ matrix.julia-version }}
arch: ${{ matrix.julia-arch }}
- name: "Load cache"
uses: julia-actions/cache@v2
- name: "Build"
Expand All @@ -39,7 +40,8 @@ jobs:
# with Pkg.develop(path="."). using Rimu, KrylovKit, StaticArrays at the end
# ensures everything is precompiled before the MPI job starts.
julia --color=yes --project=test -e "using Pkg; Pkg.instantiate(); Pkg.develop(path=\".\"); Pkg.add(\"MPI\"); Pkg.build(); using MPI; MPI.install_mpiexecjl(); using Rimu, KrylovKit, StaticArrays"
export PATH=$PATH:/home/runner/.julia/bin
export PATH=$PATH:/home/runner/.julia/bin # for linux
export PATH=$PATH:/Users/runner/.julia/bin # for macos
mpiexecjl -n 2 julia --code-coverage=user --depwarn=yes --project=test test/mpi_runtests.jl
Expand Down
1 change: 0 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ makedocs(;
"Dict vectors" => "dictvectors.md",
"BitString addresses" => "addresses.md",
"Stochastic styles" => "stochasticstyles.md",
"RMPI" => "RMPI.md",
"I/O" => "rimuio.md",
"Random numbers" => "randomnumbers.md",
"Documentation generation" => "documentation.md",
Expand Down
4 changes: 0 additions & 4 deletions docs/src/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,6 @@ See [Module `DictVectors`](@ref)

See [Module `StatsTools`](@ref)

## RMPI

See [Module `RMPI`](@ref)

# Index

```@index
Expand Down
43 changes: 0 additions & 43 deletions docs/src/RMPI.md

This file was deleted.

3 changes: 1 addition & 2 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,7 @@ needs to be done for communicating between different processes.
Using MPI parallelism with `Rimu` is easy. Enabling MPI enabled automatically if
[`PDVec`](@ref) is used to store a vector. In that case, data will be stored in a
distributed fashion among the MPI ranks and only communicated between ranks when
necessary. Additional MPI-related functionality is provided by the module [`RMPI`](@ref
Rimu.RMPI).
necessary.

## Compatibility

Expand Down
11 changes: 3 additions & 8 deletions docs/src/mpi.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,6 @@ MPI should be fairly straightforward. Generally, [`PDVec`](@ref Main.DictVectors
work with MPI automatically, as long as MPI is set up correctly and a few common pitfalls
are avoided.

Rimu includes an unexported module [`RMPI`](@ref Main.Rimu.RMPI), which must be imported to access
additional MPI-related functionality.

## Configuring MPI

When running on a cluster, ensure that MPI.jl is using the system binary. See [the MPI.jl
Expand All @@ -16,7 +13,6 @@ documentation](https://juliaparallel.org/MPI.jl/latest/configuration/) for more
It is always a good idea to start your script with a quick test that ensures the MPI is set up correctly. One way to do this is to open with

```julia
using Rimu.RMPI
mpi_allprintln("hello")
```

Expand Down Expand Up @@ -60,10 +56,9 @@ srun mpi=pmi2 julia --project -tauto script.jl
### Using `@mpi_root`

Take care to not use reducing functions (such as `length`, `sum`, `norm`, ...) inside
[`@mpi_root`](@ref Main.Rimu.RMPI.@mpi_root) blocks. Doing so will only initiate the
distributed reduction on one rank only, which will cause the code to go out of sync and
freeze. As an example, to report the current length of a vector, calculate the length before
the [`@mpi_root`](@ref Main.Rimu.RMPI.@mpi_root) block:
[`@mpi_root`](@ref) blocks. Doing so will only initiate the distributed reduction on one
rank only, which will cause the code to go out of sync and freeze. As an example, to report
the current length of a vector, calculate the length before the [`@mpi_root`](@ref) block:

```julia
len = length(pdvec)
Expand Down
2 changes: 1 addition & 1 deletion docs/src/randomnumbers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ If you want FCIQMC runs to be reproducible, make sure to seed the RNG with
[Random.seed!](https://docs.julialang.org/en/v1/stdlib/Random/#Random.seed!).

MPI-distributed runs can also be made reproducible by seeding the RNG with
[`Rimu.RMPI.mpi_seed!`](@ref).
[`mpi_seed!`](@ref).
8 changes: 3 additions & 5 deletions scripts/BHM-example-mpi.jl
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,8 @@
# [here](https://github.com/joachimbrand/Rimu.jl/blob/develop/scripts/BHM-example-mpi.jl).
# Run it with 2 MPI ranks with `mpirun -n 2 julia BHM-example-mpi.jl`.

# We start by importing `Rimu` and `Rimu.RMPI`, which contains MPI-related
# functionality.
# We start by importing `Rimu`.
using Rimu
using Rimu.RMPI

# Note that it is not necessary to initialise the MPI library, as this is already done
# automatically when Rimu is loaded.
Expand Down Expand Up @@ -62,8 +60,8 @@ problem = ProjectorMonteCarloProblem(H;
last_step=10_000
);

# The [`@mpi_root`](@ref Main.Rimu.RMPI.@mpi_root) macro performs an action on the root rank
# only, which is useful for printing.
# The [`@mpi_root`](@ref) macro performs an action on the root rank only, which is useful
# for printing.
@mpi_root println("Running FCIQMC with ", mpi_size(), " rank(s).")

# Finally, we can run the computation.
Expand Down
2 changes: 2 additions & 0 deletions src/DictVectors/communicators.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import Rimu: mpi_rank, mpi_size, mpi_comm

struct CommunicatorError <: Exception
msg::String
end
Expand Down
8 changes: 7 additions & 1 deletion src/DictVectors/pdvec.jl
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ function PDVec{K,V,N}(

# This is a bit clunky. If you modify the communicator by hand, you have to make sure it
# knows to hold values of type W. When we introduce more communicators, they should
# probably be constructed by a function, similar to how it's done in RMPI.
# probably be constructed by a function.
IW = initiator_valtype(irule, W)
if isnothing(communicator)
if MPI.Comm_size(MPI.COMM_WORLD) > 1
Expand Down Expand Up @@ -531,6 +531,12 @@ function Base.mapreduce(f::F, op::O, t::PDVecIterator; kwargs...) where {F,O}
return merge_remote_reductions(t.vector.communicator, op, result)
end

# The following method is required to make `sum` work for PDVecs with MPI on ARM processors.
# The reason is that `sum` uses a non-default reduction operator, which is not supported by
# MPI.jl on non-Intel processors. This method is a workaround that uses the default
# reduction operator.
Base.sum(f, t::PDVecIterator; kwargs...) = mapreduce(f, +, t; kwargs...)

"""
all(predicate, keys(::PDVec); kwargs...)
all(predicate, values(::PDVec); kwargs...)
Expand Down
1 change: 0 additions & 1 deletion src/DictVectors/projectors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ end

# NOTE that this returns a `Float64` opposite to the convention for
# dot to return the promote_type of the arguments.
# NOTE: This operation should work for `MPIData` and is MPI synchronizing

"""
PopsProjector() <: AbstractProjector
Expand Down
2 changes: 0 additions & 2 deletions src/Interfaces/dictvectors.jl
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,6 @@ StochasticStyle(::AbstractArray{T}) where {T} = default_style(T)
Create a "frozen" version of `dv` which can no longer be modified or used in the
conventional manner, but supports faster dot products.
If `dv` is an [`MPIData`](@ref Main.Rimu.RMPI.MPIData), synchronize its contents among the ranks first.
"""
freeze(v::AbstractVector) = copy(v)

Expand Down
50 changes: 0 additions & 50 deletions src/RMPI/RMPI.jl

This file was deleted.

Loading

0 comments on commit 0a137fc

Please sign in to comment.