-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Livestatus] Unexpected error after an arbiter reload #67
Comments
Hello, we still have this annoying problem. It gets worse as the infrastructure monitors more and more hosts. Regards, |
I don't know about this specific issue, but why restarting broker is Could you be more specific ? 2015-11-23 13:02 GMT+01:00 fullmetalucard [email protected]:
|
Hi, it's clearly a bug related to big infrastructures. To complete informations about our workaround, we made an alias shinken_reload who does this:
With this workaround, the platform seems more stable. I may also precise our shinken master isn't especially heavy loaded (load average = 2 on 8 PPC processors) Regards |
Hi, I have EXACTLY same architecture and same problem. What should i do to debug this (python debugger or else) ? Anyway thank's shinken is the best solution . Some logs when bug occurs : here :
and here :
|
Hi, I knew i was not the only one ;) Well the only thing i'm sure is that someting has to be done with livestatus. Thanks in advance for your help and patience. Regards, |
I've seen a similar behavior on one of our shinken instance. It is reliably triggered with For now, I've done an ugly patch to fix the symptoms: diff -u /usr/local/lib/python2.7/dist-packages/shinken/misc/regenerator.py.old /usr/local/lib/python2.7/dist-packages/shinken/misc/regenerator.py
--- /usr/local/lib/python2.7/dist-packages/shinken/misc/regenerator.py.old 2016-03-09 17:39:57.874430134 +0000
+++ /usr/local/lib/python2.7/dist-packages/shinken/misc/regenerator.py 2016-03-09 17:39:12.920622557 +0000
@@ -503,7 +503,7 @@
# Clean hosts from hosts and hostgroups
for h in to_del_h:
safe_print("Deleting", h.get_name())
- del self.hosts[h.id]
+ #del self.hosts[h.id]
# Now clean all hostgroups too
for hg in self.hostgroups:
@@ -514,7 +514,7 @@
for s in to_del_srv:
safe_print("Deleting", s.get_full_name())
- del self.services[s.id]
+ #del self.services[s.id]
# Now clean service groups
for sg in self.servicegroups: This is by no mean a fix, so I'm not submitting a PR. I'm also checking the installation itself. |
Could you tell me in which version cherrypy are you ? |
Hi, we're in CherryPy (3.8.0) |
Any updates? |
Upgrade 2.4.3 man. |
We're already on 2.4.3 man. |
Hi all, |
Same here, would be grateful for a fix |
Hello, I got the same issue for a professional project and it's very annoying towards our customer. We're monitoring about 5K hosts and 20K services! Thanks in advance for your help! |
+1 Hello all, same here. |
I've the same bug since one month |
Hi,
Since a few days, we encounter a new problem related to livestatus.
We're running a 2.4.1 version under Debian 8 with the following architecture:
The fact is that when when we launch an arbiter-reload, the broker gets mad because of the livestatus module. Thruk interface becomes unusable although the livestatus still seems to be up.
Here is an example of the traceback in brokerd.log:
The only workaround we found consists in restarting the broker each time we want to reload the arbiter. (and this workaround leads to high memory leaks..)
So to not replace a problem with another, we searched and found our issue could be related to issue #47
We tried to manually do the GET requests when everything goes fine and livestatus answers correctly:
(works too when doing queries about contacts, services, etc)
Another thing we noticed is that it may occur when livestatus is often asked by thruk, because we never have those errors during the night or weekend. So it might be related to the number of user/operators connected to thruk.
Any help would be appreciated,
Regards
The text was updated successfully, but these errors were encountered: