-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noble Numbat stemcell no longer uses monit #320
Comments
👍 Another vote here to get rid of monit... hopefully in place of something better. Below I want to describe issues the App Runtime Platform WG has had with monit. the monit version is oldIt is so old that it does not support HTTP health checks. we have written our own healthcheckers
✨ It would be great we didn't have to maintain our own tools to do this healthchecking! |
Was it discussed to move to a different service manager like systemd? Reading the document it sounds like the proposed way forward is with bpm only. We have use-cases which require processes running directly in the host namespaces. In our setup with haproxy-boshrelease we have side-cars and end up mounting / sharing most of the things which are isolated by bpm to make them accessible again, defeating the purpose of isolation. There are other features which bpm currently prevents us from using, such as CPU pinning (there is only a single cgroup with all CPUs and no option to change that at the moment). Personally, I'm struggling to see the need for isolating those processes in containers since we spawn dedicated VMs for deployments / jobs (this is not to say that there are no valid use-cases to do so). In a bit more detail:
We have use-cases which require processes to run directly on the host which is not possible with BPM as I've understood?
I've taken a look at the docs and don't think that this covers our use-case (but haven't tested it due to time constraints, sorry).
HTTP health-checks is not something offered by systemd, the "systemd-way" would be To be clear: I'm not saying systemd is the only way, but I'm afraid bpm in its current form is not able to address all use-cases adequately and if we are making a breaking change anyway we might want to look for well established tools instead of creating our own. |
Yeah, I'm not in favor of creating a process monitor. That's a "solved" problem and considering how painful most of the solutions are, it's not an easy problem to solve. One thing that BPM does bring is a generic yaml structure for describing how to run a bosh job. I think that would be a good starting point for a monit replacement. Since a lot of bosh jobs already have a bpm.yml file it would be easy for them to migrate. The agent could parse that file and then create a systemd (or other) config to run the job. There could still be a flag for What we don't want to do is "monit 2.0" where we have release authors create systemd service files that we then take and give to systemd directly. We should have a translation layer so we have flexibility to change the process runner/monitor at a later date. |
BPM is evolutive for any use-case that is not covered yet. We've done so in the past, and I've personally participated a couple of those evolutions: one for the JVM native code generator that required execution ability (we've added the So @maxmoehl, I really encourage you to test the In the event we would isolate an unsupported use-case, then I'm pretty sure we can get BPM to evolve and support it! |
BPM accommodates much workloads, but some that use cgroups like ContainerD will need extra shims when run on stemcells that use SystemD. I'll give a talk tomorrow at CF day EU, on a pattern I've come up with for running ContainerD with BPM. Not all stemcells run SystemD, as Bosh Lite stemcells are an exception to this. Would we build something where SystemD is a strong requirement, we may break Bosh Lite and that would be sad. |
I know @rkoster recently had some success running SystemD as part of a docker-cpi stemcell. That same might be applicable to running on the warden-cpi too. But if we have an abstract layer between the bosh release and the process manager, we could always have the bosh-agent have multiple process manager implementations and build one that still works in bosh-lite. It might lag behind, but it should be a possibility. |
The text was updated successfully, but these errors were encountered: