-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - list of parallel (multi.jl) enhancements #3340
Comments
It would be nice to have collective communications on process groups, MPI style - Ability to handle exceptions from processes that crash/die would be nice. Especially, |
This is a great list. Glad you guys are tackling this. |
I'm working on mmap support for windows. Should have something done over the next couple of days. |
Agree; good list. |
I'll mention #2847 while we're at it. |
I would like |
Closing this, has become mostly redundant. Parallel roadmap being discussed in #9167 |
Just starting a discussion on parallel (multi.jl) enhancements I could think of.
new method -
rmprocs
, the counterpart toaddprocs
master process - cleanly remove terminated worker processes from process list
worker process - terminate self if master connection is broken
new method -
procs
to return a list of active worker procs.Consequently, the way to distribute work across remote workers will be
for pid in procs()
and NOT asfor n in 1:nprocs()
introduce named process groups - for e.g.,
addprocs (pg="odbc_workers", ....) will create a named process group called
"odbc_workers". This is for situations where we want to dedicate a pool of
workers for a specific task, in this example ODBC. libodbc is a blocking API and
the non-blocking version is not trivial to merge with our event loop.
Having named process groups is akin to starting a dedicated "thread pool"
in other languages. We would thus create a dedicated out-of-process
worker pool for odbc requests, put a queue in front of it and
provide a non-blocking interface in the master process.
@parallel, pmap, etc
to support process groupsshared memory enhancements : macro
@shm_type
would declareand setup an array type in all worker processes pointing to
the same shared memory segment.
shared memory support - Windows support. Better cleanup methods on Mac/Linux
headless clusters - provide an ability to start a bunch of remote
processes, start a long running computation and detach from the cluster
At any time, it should be possible to connect to the cluster and query
state of the computation. "Connecting" to the cluster is via a local
julia process, by specifying any of the remote worker processes as
a parameter. The local julia processes then effectively becomes a
shell into the remote cluster. NOTE : This would also probably require
a port-mapper daemon (on a known port) to be running on the remote nodes.
Opinions? Additions?
The text was updated successfully, but these errors were encountered: