Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nompi stub #425

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from
Open

Add nompi stub #425

wants to merge 28 commits into from

Conversation

bobpaw
Copy link
Collaborator

@bobpaw bobpaw commented Apr 26, 2024

Add nompi stub

Add a stub implementation of MPI in pcu/pcu_pnompi.c.
Remove direct calls to MPI functions.

This adds a SCOREC_NO_MPI build option for users running on single-process, single-machine setups that do not want to compile with MPI. Message passing is simulated in memory.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Apr 26, 2024

@cwsmith, build is failing for this PR. I think we should add PCU_Init and PCU_Finalize facades that wrap the MPI calls and do a PCU_Comm_Init() and PCU_Comm_Free() to remove the direct dependencies. There are a lot of executables that would need to be changed and this would become a large PR. Alternatively, we could define a replacement MPI_Init and MPI_Finalize in pcu_pnompi and link to those. Or, we could do a hybrid approach and encourage people to use PCU_Init in the future.

@jacobmerson
Copy link
Contributor

@flagdanger and I are working on a very large update of PCU that should make this all easier by getting rid of the singleton. Pull request for that coming within the next week.

@jacobmerson
Copy link
Contributor

@bobpaw sorting out a few things to make the pull request ready to merge, but #388 the draft pull request for replacing the singleton in PCU.

Although we replaced the interface used in the test cases, this pull request is designed to work with no modification to existing codes by using a global PCU state when the old style interface is used with PCU_Comm_Init or it can use local state.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Apr 30, 2024

@jacobmerson, we will plan to use your PR for the future of core, but the sponsor wants this functionality soon, so we will use this branch until yours is finished.

@cwsmith
Copy link
Contributor

cwsmith commented Apr 30, 2024

@bobpaw Does PCU_Comm_Init() check if MPI was initialized?

@bobpaw
Copy link
Collaborator Author

bobpaw commented Apr 30, 2024

@bobpaw Does PCU_Comm_Init() check if MPI was initialized?

No, at the moment it only checks if PCU has been initialized using a global state variable. We could check with MPI_Initialized.

@onkarsahni
Copy link
Contributor

@cwsmith Note Aiden is only doing this when SCOREC_NO_MPI is defined to be true, so there is no actual MPI (e.g., see pcu/pcu_pnompi.c). For now, he is not looking into PCU_init, that will be done after the other PR #388.

@cwsmith
Copy link
Contributor

cwsmith commented Apr 30, 2024

@bobpaw OK. I'm wondering if it will make sense to make the MPI check in that call, and if it fails call MPI_Init. But... handling MPI_Finalize could be a bit more tricky; other libraries may still be running and using MPI beyond the lifetime of PCU. I'd prefer symmetry in the interface so your original suggestion may be best.

@bobpaw bobpaw marked this pull request as ready for review May 6, 2024 15:13
Added pcu_pnompi.c.
Added SCOREC_NO_MPI CMake option.
Removed direct MPI calls in parma, phasta, and PUMI.
Added incomplete reimplementations of those MPI calls.
Add stubs for MPI_Init and MPI_Finalize.
Add pcu_pnompi_types.h for MPI typedefs.
Add status message about SCOREC_NO_MPI flag.

Signed-off-by: Aiden Woodruff <[email protected]>
Replace MPI_Comm_free with PCU_Comm_Free_One.

Signed-off-by: Aiden Woodruff <[email protected]>
This will avoid object collisions when linking to libraries that already use MPI.
- i.e. the intended use case for SCOREC_NO_MPI.

Signed-off-by: Aiden Woodruff <[email protected]>
Changed add_nompi_msg to record receiver.
- Since there's no MPI, sender == receiver but I want to quiet the warnings.

Signed-off-by: Aiden Woodruff <[email protected]>
Add return type for pcu_pmpi_free, pcu_pmpi_split.

Signed-off-by: Aiden Woodruff <[email protected]>
Add linked list traversal to find a matching message.
Removed error message that did not actually indicate errors.
- If no message is found it's perfectly reasonable and in fact correct to
  return 0 and that's how pmpi knows to return false.

Signed-off-by: Aiden Woodruff <[email protected]>
Run smoke tests without mpirun given SCOREC_NO_MPI.
Add SCOREC_NO_MPI to the GitHub Actions test matrix.

Signed-off-by: Aiden Woodruff <[email protected]>
I use clock_gettime, which is one of the many methods mpich uses depending on
the system.

Signed-off-by: Aiden Woodruff <[email protected]>
Also update TriBITS package file.

Signed-off-by: Aiden Woodruff <[email protected]>
- Add mpi_test_depends function to replace set_test_properties. This
  function checks if SCOREC_NO_MPI is defined and assumes nonexistent
  tests were multiproc ones.

Signed-off-by: Aiden Woodruff <[email protected]>
- parma/group/parma_group.cc(runInGroups): remove PCU/MPI free that will
  be carried out by pcu::PCU::~PCU.
- test/xgc_split.cc: remove direct mpi.h inclusion
- test/pumiLoadMesh.cc: remove direct mpi.h inclusion.
- test/modelInfo.cc: remove direct #include <mpi.h>.
- phasta/phstream.cc(::getTime): use new pcu::Time.
- test/fieldReduce.cc: use correct PCUObj.Peers.
- pumi/pumi_sys.cc(pumi_sync): use PCU Barrier instead of MPI Barrier.
- pcu/PCU.h: make PCU_Comm (holdover) functions extern C and PCU_Wtime.
- pumi/pumi_ghost.cc(do_off_part_bridge): fix typo (forgot parens).
- pumi/pumi_mesh.cc: capitalize GetMPIComm.
- pumi/pumi_numbering(pumi_numbering_print): use pcu::PCU::Barrier
  instead of WORLD barrier.
- pcu/PCU_C.h: optionally include mpi or stub.
- pcu/pcu_c.cc: add split stub and include pcu_pmpi.h.
- pcu/pcu_pmpi.c: remove leftover conflict markers.
- pcu/pcu_pnompi.h: update function signatures, make extern C, and
  remove obsolete stubs.
- pcu/pcu_pnompi.c: remove globals and use pcu_mpi_t*.
- clean up formatting.
- remove unused or (void)ed argument names.

Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/PCU.cc(PCU::~PCU): clean up message order if it exists (valgrind
  is happy now).

Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/pcu_pmpi.c: remove pcu_pmpi_switch, pcu_pmpi_comm which were used
  by old nompi stubs.
- (pcu_pmpi_allgather, pcu_pmpi_allreduce): break long lines and clean
  up indentation.

Signed-off-by: Aiden Woodruff <[email protected]>
- pcu/pcu_pnompi.c: restore function arguments in definitions that were
  removed in b76d8f1. Turns out they are required in C and the local
  compiler was not flagging it.

Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw
Copy link
Collaborator Author

bobpaw commented Feb 6, 2025

@cwsmith, after seeing the new PCU object introduced by #433, we would still like to take this build-time stubbing approach. This PR results in compilation of the entire repository without MPI dependence. Since I merged develop a few times, should we squash this PR?

@bobpaw bobpaw requested a review from cwsmith February 6, 2025 15:30
Copy link
Contributor

@cwsmith cwsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you. There are a few comments below.

Yes, I agree, we should squash-merge.

CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines +5 to +13
#include "pcu_pnompi_types.h"
// Remove MPI calls.
#define MPI_Init(argc, argv) do { \
(void) argc; \
(void) argv; \
} while (0)
#define MPI_Finalize(void)
#ifdef __cplusplus
extern "C"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the scope of these macros? We still want to let users of PUMI run their application in parallel even if PUMI is only built for serial execution.

I think we should add a test that builds an MPI hello world app and links against a 'SCOREC_NO_MPI=on' build, and loads a serial pumi mesh etc.. This will help ensure that the symbols don't leak in the long run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. Right now, those macros would spread to every source that includes PCU.h. I think a simple #ifndef MPI_COMM_WORLD may suffice in these cases. Then PUMI tests will use the stub MPI_Init and end-users can #include <mpi.h> before PCU.h and they will get the real versions.

I will work on making a test case so that we can observe the negative first before trying that fix.

Also, it would require a major notice to end-users (since include order is not usually important). What do you think?

Copy link
Contributor

@cwsmith cwsmith Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building with SCOREC_NO_MPI=off then I assume that everything that worked prior to this PR would continue to work. Is that correct? If so, then I think we could count this as a new 'feature' with specific requirements (i.e., include order, no mpi headers, etc.).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a test that builds an MPI hello world app and links against a 'SCOREC_NO_MPI=on' build, and loads a serial pumi mesh etc.

Should this be a GitHub action, similar to building the doc/ folder?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I see the Developer Guide on the Wiki which I will update after merging this PR. Is the user guide LaTeX anywhere? A quick Ctrl+F doesn't show any reference to any of the SCOREC_* build options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a GitHub action, similar to building the doc/ folder?

Yes.

latex

The latex files for the user guide are here:

https://github.com/SCOREC/pumi-docs/tree/master/userGuide

Comment on lines +5 to +13
#include "pcu_pnompi_types.h"
// Remove MPI calls.
#define MPI_Init(argc, argv) do { \
(void) argc; \
(void) argv; \
} while (0)
#define MPI_Finalize(void)
#ifdef __cplusplus
extern "C"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same scope question as in PCU.h

Comment on lines +13 to +20
typedef int MPI_Comm;
typedef int MPI_Request;
typedef int MPI_Datatype;
typedef int MPI_Op;
#define MPI_COMM_WORLD 0
#define MPI_ANY_SOURCE -1
#define MPI_SUM 1
#define MPI_INT 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same scope question as in PCU.h

@@ -11,7 +11,6 @@
#include <unistd.h>
#include <iostream>
#include <algorithm>
#include <mpi.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SCOREC_NO_MPI=ON, then unless the user is compiling with the mpi wrappers, then source code and tests cannot #include <mpi.h>. Is that correct? If so, we should document that in the users and developers guide and that no MPI functions should be called directly within PUMI source.

test/testing.cmake Outdated Show resolved Hide resolved
test/smokeTesting.cmake Outdated Show resolved Hide resolved
- test/smokeTesting.cmake: indent if statements
- test/testing.cmake: indent if statements in mpi_test.

Signed-off-by: Aiden Woodruff <[email protected]>
- CMakeLists.txt: update SCOREC_NO_MPI option helpstring.

Co-authored-by: Cameron Smith <[email protected]>
@bobpaw
Copy link
Collaborator Author

bobpaw commented Feb 6, 2025

Before I make my example project I have a feeling I will need to add a config file, called something like mpi_rpc.h (@cwsmith I would appreciate input on the name) which will be generated to make SCOREC_NO_MPI persistent. Then I can bundle the PCU.h #ifdef SCOREC_NO_MPI logic and the content in pcu_pnompi_types.h into one file.

@cwsmith
Copy link
Contributor

cwsmith commented Feb 7, 2025

We should probably just make a PUMI_config.h file (like this) and the first entry in it can be the SCOREC_NO_MPI flag. Over time, we can move away from using add_definition(...) to the config file approach.

- example/: new directory for example projects. I will move doc/ here.
- example/mpi-nompi/: new example to test an mpi app using a nompi build
  of SCOREC.
  - README.md: describe example.
  - hello.cc: print size/rank info and then create a box mesh like in
    the first example.
  - CMakeLists.txt: add build code (from doc/ example) and tests on 1,
    2, and 4 procs.
  - add tests for exact output.
- .github/workflows/cmake.yml: add GitHub action for the mpi-nompi
  example.

Signed-off-by: Aiden Woodruff <[email protected]>
@bobpaw bobpaw force-pushed the apw/nompi branch 3 times, most recently from bf506a7 to 5218f25 Compare February 10, 2025 03:03
Signed-off-by: Aiden Woodruff <[email protected]>
- example/mpi-nompi/hello.cc: add allreduce sum code.
- example/mpi-nompi/CMakeLists.txt: update test regexes.

Signed-off-by: Aiden Woodruff <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants