-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
circus watchers need an async_kill option in some cases #987
Comments
This conflict error is something that comes up pretty often. I'm not sure what's the solution, but this is clearly something we have to fix. Actually, I'm wondering why we are blocking other commands during @thefab : Do see any major blocking point making this impossible? |
I do the same analyze. But split I'm currently working on more pragmatic change. I introduce this option at a watcher level for the moment:
The main problem, in our use case, is that the graceful_timeout is blocking the complete execution of With async_kill=True and very few changes to deal with that new option, the This mitigates the conflict error because |
#988 is deployed on our integration server and works well (after a few days) Ok, it's not the perfect solution to the "manage_watchers monolith". But I can't work on it for the moment. It's too big and too risky for us at this moment. The #988 proposal mitigates the problem. In our particular use case, it's a huge win. But are you interested in ? I'm waiting for your decision. Thanks |
I didn't have time to take a deep look at it. From what I've seen it seems to be a short change and a big win, so I think this could be merged. Anyway, I'm trying to fix our tests before accepting new PRs, and I'm starting to see the light at the end of the tunnel :) . |
ok perfect, I'm going to wait for your "tests fix" before breaking them with my changes ;-) |
When you have several watchers with:
graceful_timeout
(600 seconds in our case)max_age
values > 0The call of
manage_watchers
of theArbiter
from theController
can be very long (600 seconds in our case). During this delay, circus is not blocked (thanks to tornado ioloop) but:manage_watchers
is running)manage_watchers
is launchedSo during this graceful 600s period, other dead or expired processes (even for other watchers) are not (re)launched anymore.
The real fix is really intrusive and complex because of concurrency issues.
We are working on a little fix based on a "async_kill" flag (at the watcher level).
Ideas and advices are welcome
The text was updated successfully, but these errors were encountered: