Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger version check leads to delayed Kiali login screen until timeout occur #8100

Open
mimek opened this issue Jan 28, 2025 · 9 comments · May be fixed by #8131
Open

Jaeger version check leads to delayed Kiali login screen until timeout occur #8100

mimek opened this issue Jan 28, 2025 · 9 comments · May be fixed by #8131
Assignees
Labels
bug Something isn't working

Comments

@mimek
Copy link

mimek commented Jan 28, 2025

Describe the bug

Jaeger version check configuration is stripping the port of internal_url which is not desired in every case and assumes default http port 80 (or 443 for https). For example in k8s environment where jaeger is being deployed with jaeger-operator it's not possible to define service for queries to listen on port 80, therefore we currently end on situation that Kiali -> Jaeger queries and integration are working fine (through gRPC), but during loading the Kiali for the first time, checking Jaeger version leads to timeout (because of incorrect port) and Kiali UI start is delayed by 10 seconds of timeout.

During jaeger version check in the browser we observe for 10 seconds:

Image

In the logs of Kiali we can see:
2025-01-28T22:23:08Z INF jaeger version check failed: url=[http://jaeger-operator-jaeger-query.tracing.svc.cluster.local], code=[0], err=[Get "http://jaeger-operator-jaeger-query.tracing.svc.cluster.local": context deadline exceeded (Client.Timeout exceeded while awaiting headers)]

Expected Behavior

a) Version check should be configurable in the way that it can be disabled at all
b) Version check should be configurable and allow to specify the port (and not strip the default)
c) Timeout time should be configurable or doesn't affect loading Kiali UI at all.

What are the steps to reproduce this bug?

  1. Enable tracing with jeager
  2. Define in_cluster_url for tracing on non standard 80 port
  3. Reload Kiali UI and observe loading screen delay and logs for timeout in version check

Environment

Learn about how to determine versions here.

  • Kiali version: 2.4.0
  • Istio version: 1.24
  • Kubernetes impl: k3d
  • Kubernetes version: 1.30.4
  • Other notable environmental factors: Jaeger deployed with jaeger-operator where query service listen on port 16686
@mimek mimek added the bug Something isn't working label Jan 28, 2025
@jmazzitelli
Copy link
Collaborator

Define in_cluster_url for tracing on non standard 80 port

This probably has nothing to do with it, but in Kiali 2.x, in_cluster_url is deprecated in lieu of internal_url.

The version check was changed recently though - perhaps some problem was introduced here?

@mimek
Copy link
Author

mimek commented Jan 28, 2025

Define in_cluster_url for tracing on non standard 80 port

This probably has nothing to do with it, but in Kiali 2.x, in_cluster_url is deprecated in lieu of internal_url.

Yes, my fault - we're utilizing internal_url of course.

@frittentheke
Copy link

frittentheke commented Jan 30, 2025

The code causing this is likely here: https://github.com/kiali/kiali/blob/v2.4.0/status/versions.go#L98-L112
In case of gRPC the port is stripped off the URL, but there is no way to determine the HTTP port and ports 80/443 are statically used.
... https://github.com/kiali/kiali/blob/v2.4.0/status/versions.go#L103

@jshaughn
Copy link
Collaborator

This may be a duplicate of #8106.

@josunect josunect self-assigned this Feb 5, 2025
@josunect josunect linked a pull request Feb 6, 2025 that will close this issue
@josunect
Copy link
Contributor

josunect commented Feb 6, 2025

I think the status API call should not block the login screen, so I've created a PR for this: #8131

Regarding why the check url is not working when Jaeger is not deployed in the standard port (80): This happens when gRPC is used, as gRPC doesn't have an endpoint that returns the version, to avoid an additional configuration url, Kiali tries to obtain the URL from the http endpoint, and Kiali uses the same url as gRPC, but using the standard port (@jmazzitelli , I think that was the decision about this: #7746 (comment)). Probably this should be changed.

@jmazzitelli
Copy link
Collaborator

as gRPC doesn't have an endpoint that returns the version

If this is true, and our attempt to retrieve it from the HTTP endpoint doesn't work because the port cannot be auto-detected, then we need a way to tell Kiali to not even bother to do the version check.

So I would say we can do one of two things:

  1. If gRPC is configured, rip out that code I added to try to go over HTTP over standard port - just don't check it at all (i.e. if "gRPC is enabled, disable the version check entirely")
  2. Have a configuration option for the user to tell us to not perform the version check. This allows us to keep that code to fallback to the HTTP endpoint - so if the user does have the HTTP endpoint bound to the standard port, the version check will work. But if the user is using gRPC without the HTTP endpoint point to the standard port, this allows them to tell Kiali to not do the check at all.

So the options are (1) implicitly disable the check if gRPC is enabled and (2) allow the user to explicitly disable the check if gRPC is enabled and they don't have an HTTP endpoint on the standard port.

I'm OK with doing either one. I'll leave it up to the person who chooses to implement this fix.

@josunect
Copy link
Contributor

josunect commented Feb 6, 2025

as gRPC doesn't have an endpoint that returns the version

If this is true, and our attempt to retrieve it from the HTTP endpoint doesn't work because the port cannot be auto-detected, then we need a way to tell Kiali to not even bother to do the version check.

So I would say we can do one of two things:

1. If gRPC is configured, rip out that code I added to try to go over HTTP over standard port - just don't check it at all (i.e. if "gRPC is enabled, disable the version check entirely")

2. Have a configuration option for the user to tell us to not perform the version check. This allows us to keep that code to fallback to the HTTP endpoint - so if the user does have the HTTP endpoint bound to the standard port, the version check will work. But if the user is using gRPC without the HTTP endpoint point to the standard port, this allows them to tell Kiali to not do the check at all.

So the options are (1) implicitly disable the check if gRPC is enabled and (2) allow the user to explicitly disable the check if gRPC is enabled and they don't have an HTTP endpoint on the standard port.

I'm OK with doing either one. I'll leave it up to the person who chooses to implement this fix.

I think we could go for 2) so at least it is possible to get the version in certain conditions. I can included in the PR #8131

@jshaughn
Copy link
Collaborator

jshaughn commented Feb 6, 2025

I'm also OK with either option, @josunect can decide which route to take. For option 2, we could make it opt-in, so the version check does not happen by default.

Is login the only slow-down point? I thought users were seeing slow-down across the UI.

@josunect
Copy link
Contributor

josunect commented Feb 6, 2025

I'm also OK with either option, @josunect can decide which route to take. For option 2, we could make it opt-in, so the version check does not happen by default.

Is login the only slow-down point? I thought users were seeing slow-down across the UI.

I've seen the status call taking a lot of time. In my dev env, takes more than 3 seconds where the others take some ms.
At the moment I don't think other calls are creating a bottleneck here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: 🏗 In progress
Development

Successfully merging a pull request may close this issue.

5 participants