Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - list of parallel (multi.jl) enhancements #3340

Closed
4 of 9 tasks
amitmurthy opened this issue Jun 10, 2013 · 7 comments
Closed
4 of 9 tasks

RFC - list of parallel (multi.jl) enhancements #3340

amitmurthy opened this issue Jun 10, 2013 · 7 comments
Labels
parallelism Parallel or distributed computation

Comments

@amitmurthy
Copy link
Contributor

Just starting a discussion on parallel (multi.jl) enhancements I could think of.

  • new method - rmprocs, the counterpart to addprocs

  • master process - cleanly remove terminated worker processes from process list

  • worker process - terminate self if master connection is broken

  • new method - procs to return a list of active worker procs.
    Consequently, the way to distribute work across remote workers will be
    for pid in procs() and NOT as for n in 1:nprocs()

  • introduce named process groups - for e.g.,
    addprocs (pg="odbc_workers", ....) will create a named process group called
    "odbc_workers". This is for situations where we want to dedicate a pool of
    workers for a specific task, in this example ODBC. libodbc is a blocking API and
    the non-blocking version is not trivial to merge with our event loop.

    Having named process groups is akin to starting a dedicated "thread pool"
    in other languages. We would thus create a dedicated out-of-process
    worker pool for odbc requests, put a queue in front of it and
    provide a non-blocking interface in the master process.

  • @parallel, pmap, etc to support process groups

  • shared memory enhancements : macro @shm_type would declare
    and setup an array type in all worker processes pointing to
    the same shared memory segment.

  • shared memory support - Windows support. Better cleanup methods on Mac/Linux

  • headless clusters - provide an ability to start a bunch of remote
    processes, start a long running computation and detach from the cluster
    At any time, it should be possible to connect to the cluster and query
    state of the computation. "Connecting" to the cluster is via a local
    julia process, by specifying any of the remote worker processes as
    a parameter. The local julia processes then effectively becomes a
    shell into the remote cluster. NOTE : This would also probably require
    a port-mapper daemon (on a known port) to be running on the remote nodes.

Opinions? Additions?

@ViralBShah
Copy link
Member

It would be nice to have collective communications on process groups, MPI style - bcast, reduce, allreduce, gather, allgather, alltoall, and barrier.

Ability to handle exceptions from processes that crash/die would be nice. Especially, pmap and @parallel could be made to tolerate disappearing workers.

@StefanKarpinski
Copy link
Member

This is a great list. Glad you guys are tackling this.

@Keno
Copy link
Member

Keno commented Jun 10, 2013

I'm working on mmap support for windows. Should have something done over the next couple of days.

@JeffBezanson
Copy link
Member

Agree; good list.

@mlubin
Copy link
Member

mlubin commented Jun 10, 2013

I'll mention #2847 while we're at it.

@ViralBShah
Copy link
Member

I would like @parallel to be able to do dynamic scheduling as well, like pmap.

@amitmurthy
Copy link
Contributor Author

Closing this, has become mostly redundant. Parallel roadmap being discussed in #9167

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

6 participants