-
Notifications
You must be signed in to change notification settings - Fork 0
2017 09 27
Wesley Bland edited this page Sep 27, 2017
·
1 revision
- Intel - Wesley
- Argonne - Ken, Yanfei
- UTK - Aurelien
- ORNL - Geoffroy
- Wesley made edits based on the feedback from the face-to-face.
- There are still a couple of very minor edits that need to be made
- Is it possible to use ULFM and Reinit at the same time?
- Not sure how they can be composed (even if the smaller communicator used ULFM) because the error handler for the larger communicator is still likely to be triggered after a process failure, which would trigger reinit.
- We don't think it's a problem to use error handlers, but if using
MPI_ERRORS_REINIT
, it would need to be consistent across all communicators.- We still like using error handlers better than an API call
- It doesn't create a new API interface
- Changing the error handler is already required for process fault tolerance anyway.
- We still like using error handlers better than an API call
- Aurelien - Write first draft of ULFM composability/recovery advice to have libraries repair MPI in one place.
-
Aurelien - Merge
MPI_COMM_ISHRINK
branch - Aurelien - Go back over other ULFM branches so we can discuss them next time
- Wesley - Go back through ULFM RMA discussions to see what we need to do (if anything to move forward).
- Wesley - Improve slides for catastrophic errors to include example use cases