-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run worker process in launcher pod #612
run worker process in launcher pod #612
Conversation
If you really want to put a launcher process in the same node as a worker, you can simply set requests to empty or something small, with the same behavior. |
This is not intuitive at all. |
Well, there are some reason that worth notice,
I'm not sure if the issue #503 require this feature, while it may related to this. |
I'm agree that introducing a configure in CRD may be better, but the modification of CRD is somewhat expensive. While personally, I totally OK with the statement that
So I accept the design. |
An explicit API will almost always be better. Maybe something like But ok, I can understand that your cluster might be entirely homogeneous, in which case a separate pod could be detrimental. Any concerns @tenzen-y ? |
@alculquicondor @tenzen-y |
@alculquicondor @kuizhiqing First of all, I believe that this feature would be worth it. Actually, in my cluster, the launcher is scheduled to CPU nodes only with Ethernet, and the workers are scheduled to GPU nodes with Ethernet and RoCE networks. Provided that the mixed network protocols, I sometimes fall into rabbit holes :( My only concern is the situation in which the launcher with the worker pod failed due to a worker process error.
I think we should introduce a new field for this feature in this PR. As an alternative API name, we could add the Or, I think we could create a new field runPolicy:
spawnStrategy:
launcher: <asWorker (withinWorker)|Independent> @alculquicondor @kuizhiqing WDYT? |
I wouldn't touch The name should match the behavior. What would you rather have?
I think option 1 is cleaner. Option 2 assumes that the operator has control over the entry-point, which is not true. And we would have to re-implement retry semantics, as opposed to using the Job API. |
It sounds reasonable. I prefer option 1, too. |
Yes, optionally it could be nil. Setting replicas=0 would require you to define a Pod spec, which is unnecessary. |
@tenzen-y
OK, let me continue to work on it, tell me if you have more suggestions. |
SGTM |
9dfb551
to
2014d2f
Compare
@alculquicondor @tenzen-y ready to review, PTAL |
/assign @alculquicondor |
ACK |
@kuizhiqing Can you address CI issues? |
@tenzen-y Done |
Sorry for the late response. I will come back here after the k/k enhancement freeze. |
I came back here right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could you implement an integration test here: https://github.com/kubeflow/mpi-operator/blob/master/test/integration/mpi_job_controller_test.go?
Further more, could we support the zero explicit workers?
pkg/controller/mpi_job_controller.go
Outdated
switch mpiJob.Spec.MPIImplementation { | ||
case kubeflow.MPIImplementationOpenMPI: | ||
buffer.WriteString(fmt.Sprintf("%s%s-%d.%s.%s.svc slots=%d\n", mpiJob.Name, workerSuffix, i, workersService, mpiJob.Namespace, slots)) | ||
buffer.WriteString(fmt.Sprintf("%s.%s.%s.svc slots=%d\n", name, workersService, mpiJob.Namespace, slots)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice refactorings :)
pkg/controller/mpi_job_controller.go
Outdated
@@ -1291,6 +1302,13 @@ func updateDiscoverHostsInConfigMap(configMap *corev1.ConfigMap, mpiJob *kubeflo | |||
|
|||
var buffer bytes.Buffer | |||
buffer.WriteString("#!/bin/sh\n") | |||
|
|||
// We donnot check if launcher is running here, launcher should always be there or the job failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// We donnot check if launcher is running here, launcher should always be there or the job failed | |
// We don't check if launcher is running here, launcher should always be there or the job failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what happens if we use the LauncherCreationPolicy=WaitForWorkersReady
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we leave the user start the launcher process and worker 0 sshd, it do NOT change the workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
2351cdc
to
81ef2f9
Compare
@tenzen-y Thanks for your time on reviewing. I've addressed your comments, plz let me know if more modifs are needed. We do support zero explicit workers setting, where I tested with the following example,
And for the integration test part, after I finished it, I realize that this feature do NOT change the whole workflow as before, so it may take no different, while the unit test does. Actually, this feature does the two things indeed,
Note that
|
81ef2f9
to
2aa5c9d
Compare
Signed-off-by: kuizhiqing <[email protected]>
2aa5c9d
to
323e1ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kuizhiqing Also, could you address my other comment below?
Also, could you implement an integration test here: https://github.com/kubeflow/mpi-operator/blob/master/test/integration/mpi_job_controller_test.go?
pkg/controller/mpi_job_controller.go
Outdated
@@ -1291,6 +1302,13 @@ func updateDiscoverHostsInConfigMap(configMap *corev1.ConfigMap, mpiJob *kubeflo | |||
|
|||
var buffer bytes.Buffer | |||
buffer.WriteString("#!/bin/sh\n") | |||
|
|||
// We donnot check if launcher is running here, launcher should always be there or the job failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense.
pkg/controller/mpi_job_controller.go
Outdated
if (mpiJob.Spec.RunLauncherAsWorker != nil && *mpiJob.Spec.RunLauncherAsWorker) || | ||
mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationIntel || | ||
mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationMPICH { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, I'm wondering if we can make this condition another function like workersCanHaveDedicatedService
since we're using the same condition in some places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for introducing me with ptr.Deref
. There is 1 time in controller and 2 times in test now, so it's OK for me to keep it in this PR without adding another function. WDYT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with either way.
Oh, I see. Thank you for the confirmation! |
Uhm, it sounds reasonable considered in have a unit test, but let me know what @alculquicondor think. |
Signed-off-by: kuizhiqing <[email protected]>
@kuizhiqing Let me know once building errors are fixed. |
Signed-off-by: kuizhiqing <[email protected]>
98a56b9
to
224e5e2
Compare
@tenzen-y Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @alculquicondor
Thx @tenzen-y @alculquicondor , and I'll address the following in another PR
|
lgtm |
pkg/controller/mpi_job_controller.go
Outdated
@@ -656,13 +657,14 @@ func (c *MPIJobController) syncHandler(key string) error { | |||
return err | |||
} | |||
} | |||
if mpiJob.Spec.MPIImplementation == kubeflow.MPIImplementationIntel || | |||
// If we want to run process in launcher, we should create a service for launcher. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't create an additional service in the case of OpenMPI. Just allowing the existing Service to match the launcher pod should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing Service created with selector training.kubeflow.org/job-role: worker
which can't be used directly. Without creating Service for launcher, we should change the service for worker and remove the selector.
Create a service is more preferable for me. WDYT @tenzen-y @alculquicondor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more efficient to just change the selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor Do you mean change the selector in the case of OpenMPI or in all the case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just change it for all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Although I'm not sure if IntelMPI would work with just one Service, as it's very picky about the hostname. Worth trying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK,I will try it. While we just make the service refactor in this PR or with another one, since it's already a way to run with respect to the original service design?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alculquicondor @tenzen-y I've add a service for both launcher and workers, for IntelMPI which need to access launcher with hostname, I modify the searches part in DNSConfig.
Signed-off-by: kuizhiqing <[email protected]>
Signed-off-by: kuizhiqing <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good, but can you check something manually?
- Create a long-running MPIJob using the released controller.
- Upgrade the controller to a build containing this PR
- Check that the MPIJob continues running.
I believe it should continue to run, but let's better be safe.
@alculquicondor |
That's what I thought, and it's ok. /approve @tenzen-y anything else? |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ah, @tenzen-y already gave lgtm, so it will merge :) |
Maybe we can add an integration test for this situation by enabling and disabling this feature, but I'm ok with working on it at another PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Feature:
One can run worker process in the launcher pod by declare the same resources requirement for launcher and worker.
Why this ?
For a cluster fully equipped with GPU nodes, launcher pod occupied CPU resources which makes the worker cannot fully take advantage of all the physical resources.
With a cluster with CPU only nodes and GPU nodes, it may also suffer from weak efficiency due to the network between the two in most cases.
How ?
We introduce new entry in the spec as
spec.runLauncherAsWorker
.Don't forget to run
sshd
service in launcher in this mode.This PR is a part of the project #611.