-
Notifications
You must be signed in to change notification settings - Fork 176
Splitter Executor
RFC #13
Author: A.Casajús
Last Modified: 2013-05-06
Up until now there have been two ways to split jobs in DIRAC:
- Client-side splitting: Client generates and submits to DIRAC n jobs
- : Client can do whatever is needed
- : Takes a lot of time
- : Needs n job submissions
- : Can't cache results
- JobManager splitting: As soon as DIRAC's JobManager gets the job, it divides the job as required and returns a list of jids
- : Client doesn't have to do the splitting
- : Client receives all the jids at submission time
- : Takes a lot of time and slows other submissions
- : It's very restrictive on what can be done.
- : Difficult to extend the functionality
We'd like to get the best of both worlds. Fast job submission, extendable splitting mechanisms and take advantage of knowledge that DIRAC already has to speed up the splitting.
Users should be able to send a job, define how it has to be divided and let DIRAC do the splitting. The job manifest should include all the necessary information for DIRAC to know how to split it. But instead of the JobManager dividing the job, it should be stored in the JobDB and divided asynchronously.
DIRAC will not know a priori how many jobs will be generated. That means that users will now know at submission time all the jids. But they will be able to request that information once the job has been divided.
The Job splitting will be done by specific modules. Each module will divide the job in a different way. Users will define which module they want DIRAC to use when dividing their job. For instance, one module can do parametric jobs just like the JobManager does, another one can divide Input Data based on where it is... Any DIRAC extension can make their own modules if needed.
Jobs will define which module has to be used when splitting them. This is done by defining a Splitter option in the manifest. The value of this option is the name of the module to use. Splitter=Parametric will use the ParametricSplitter module to divide it.
Each splitter module will do different things.
Since splitter modules can define new options in the manifest, there must be a way for users to define where this new options have to be used.
To do so a new Optimizer has to be created. This optimizer will look at the manifest, check that the requested splitting module exists, run the manifest by it and submit the resulting manifests to the Mind to be stored into the JobDB as new jobs.