-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: still struggles on larger clusters #461
Comments
I think we have solved this in the prerelease channel. Will test again. New prerelease going out now, and if it's all successful we'll see a 0.25.1 in the morning tomorrow. Sorry to anyone on the prerelease channel who we may have impacted with broken features in the past week or two 🙏 |
The performance in the prerelease channel is much better now. Also all of the known regressions have been fixed, except one: We're going to try to resolve that one, and maybe add some additional performance metrics collection before the next minor. |
In There is still the occasional hang, that seems to get worse with multiple instances running at once... and there is still: but across repeated trials the performance is always very good, the occasional hang is reliably recovering after the scheduled timeout, and the rest performance isn't crushing my CPU like it always used to do. I'd love to figure out what causes the occasional hang I'm seeing before we push this out to the release channel, but I'm afraid we have already crossed that bridge with 0.25.0 and we'd better just go ahead and release what we have got now. It's seriously much better. If you had problems in I'm planning on throwing the release lever tomorrow. There are a lot of changes in here, putting the changelog together over on: If anyone out there is listening, please do your worst (test and complain) on the Prerelease channel. Be sure you have the latest updates, we've really overhauled the internals and the responsiveness and performance are all drastically better now. |
Closing, v0.25.1 is out and performance is much better now. |
Unsure what the bottleneck is, but we have several test clusters and test workstations, and while the extension performs very well on the smaller leaf cluster
limnocentral
it unfortunately performs very poorly on themanagement
cluster, I think the problem happens where there are more Flux Kustomizations than available concurrent CPU threads...I saw issues on my workstation which is an M1 with 8 cores, when there are 13 Kustomizations and 7 Helmreleases) – I think after we analyzed the problem we decided that we need some kind of double-buffer and render loop to prevent this from creating a nasty blocking lag in the UI.
If you're experiencing similar issues, please chime in here so we can get your feedback. It will be helpful to validate the issue with a user, it'll be good to have a stronger resolution than "works on my machine now" when we already have such inconsistent repro across workstations today.
These ideas are planned to be implemented next week, there's a larger refactor and retooling that it will come as part of that delivery. We will likely have something on the prerelease channel before we have the main release, I don't think we'll do 0.26 until after these performance issues are basically worked through in the 0.25.x branch.
Related:
The text was updated successfully, but these errors were encountered: