Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Healthz returns incorrect status with health_check_grpc_backend active #751

Open
TheSpy opened this issue Nov 12, 2022 · 4 comments
Open
Assignees

Comments

@TheSpy
Copy link

TheSpy commented Nov 12, 2022

Hello
I am using espv2 (2.39.0 ) active health checking to track backend status.
Parameters I use are:

--healthz=healthz
--health_check_grpc_backend
--health_check_grpc_backend_interval=5s

My expectation is:
when ESPv2 has started and backend service has not started yet an esp endpoint /healthz should fail.
However a response received is 200 OK with body { "code": 200, "message": "" }

In espv2 container log I periodically see

In espv2 container log I periodically see lines
"D1112 20:22:43.232 24 D1112 20:44:18.625 27 external/envoy/source/common/http/codec_client.cc:57] [27][client][C23] connecting
D1112 20:44:18.625 27 external/envoy/source/common/network/connection_impl.cc:924] [27][connection][C23] connecting to 192.168.65.2:21411
D1112 20:44:18.625 27 external/envoy/source/common/network/connection_impl.cc:943] [27][connection][C23] connection in progress
D1112 20:44:20.657 27 external/envoy/source/common/network/connection_impl.cc:695] [27][connection][C23] delayed connect error: 111
D1112 20:44:20.657 27 external/envoy/source/common/network/connection_impl.cc:250] [27][connection][C23] closing socket: 0
D1112 20:44:20.657 27 external/envoy/source/common/http/codec_client.cc:108] [27][client][C23] disconnect. resetting 1 pending requests
D1112 20:44:20.657 27 external/envoy/source/common/http/codec_client.cc:140] [27][client][C23] request reset
D1112 20:44:20.657 27 external/envoy/source/common/upstream/health_checker_impl.cc:787] [27][hc][C23] connection/stream error health_flags=healthy

Is that an intended behavior?

I use ingress load balancer which is monitoring /healthz endpoint to check for status. A pod is either added or removed from a loadbalancer based on the status. With the current setup (on rolling update) on rollout a failing pod is added to a balancer causing request timeouts, because backend is not ready yet but esp tells that backend pod is ready. After a couple of seconds a pod is removed from a load balancer after esp does a couple of checks to a backend and change status from healthy to unhealthy.

Thanks!

@TheSpy
Copy link
Author

TheSpy commented Nov 12, 2022

UPDATE:
Seems like healthy state is replaced with unhealthy after 3 attempts to call backend health check endpoint.

D1112 21:01:19.167 28 external/envoy/source/common/network/connection_impl.cc:924] [28][connection][C40] connecting to 192.168.65.2:21411
D1112 21:01:19.167 28 external/envoy/source/common/network/connection_impl.cc:943] [28][connection][C40] connection in progress
D1112 21:01:20.584 28 external/envoy/source/extensions/network/dns_resolver/cares/dns_impl.cc:341] [28][dns]dns resolution for 169.254.169.254 started
D1112 21:01:20.655 28 external/envoy/source/extensions/network/dns_resolver/cares/dns_impl.cc:262] [28][dns]dns resolution for 169.254.169.254 completed with status 0
D1112 21:01:21.206 28 external/envoy/source/common/network/connection_impl.cc:695] [28][connection][C40] delayed connect error: 111
D1112 21:01:21.206 28 external/envoy/source/common/network/connection_impl.cc:250] [28][connection][C40] closing socket: 0
D1112 21:01:21.206 28 external/envoy/source/common/http/codec_client.cc:108] [28][client][C40] disconnect. resetting 1 pending requests
D1112 21:01:21.207 28 external/envoy/source/common/http/codec_client.cc:140] [28][client][C40] request reset
D1112 21:01:21.207 28 external/envoy/source/common/upstream/health_checker_impl.cc:787] [28][hc][C40] connection/stream error health_flags=/failed_active_hc

Would it be possible to change initial state to unhealthy?

@nareddyt
Copy link
Contributor

@qiwzhang could you take a look?

@qiwzhang
Copy link
Contributor

Sure

@qiwzhang qiwzhang self-assigned this Nov 14, 2022
@qiwzhang
Copy link
Contributor

ESPv2 is using envoy. Currently, it is not possible to change the initial_health_status to unhealthy.

I just checked its health check config, it seems that it is not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants