Healthz returns incorrect status with health_check_grpc_backend active #751

TheSpy · 2022-11-12T20:38:26Z

Hello
I am using espv2 (2.39.0 ) active health checking to track backend status.
Parameters I use are:

--healthz=healthz
--health_check_grpc_backend
--health_check_grpc_backend_interval=5s

My expectation is:
when ESPv2 has started and backend service has not started yet an esp endpoint /healthz should fail.
However a response received is 200 OK with body { "code": 200, "message": "" }

In espv2 container log I periodically see

In espv2 container log I periodically see lines
"D1112 20:22:43.232 24 D1112 20:44:18.625 27 external/envoy/source/common/http/codec_client.cc:57] [27][client][C23] connecting
D1112 20:44:18.625 27 external/envoy/source/common/network/connection_impl.cc:924] [27][connection][C23] connecting to 192.168.65.2:21411
D1112 20:44:18.625 27 external/envoy/source/common/network/connection_impl.cc:943] [27][connection][C23] connection in progress
D1112 20:44:20.657 27 external/envoy/source/common/network/connection_impl.cc:695] [27][connection][C23] delayed connect error: 111
D1112 20:44:20.657 27 external/envoy/source/common/network/connection_impl.cc:250] [27][connection][C23] closing socket: 0
D1112 20:44:20.657 27 external/envoy/source/common/http/codec_client.cc:108] [27][client][C23] disconnect. resetting 1 pending requests
D1112 20:44:20.657 27 external/envoy/source/common/http/codec_client.cc:140] [27][client][C23] request reset
D1112 20:44:20.657 27 external/envoy/source/common/upstream/health_checker_impl.cc:787] [27][hc][C23] connection/stream error health_flags=healthy

Is that an intended behavior?

I use ingress load balancer which is monitoring /healthz endpoint to check for status. A pod is either added or removed from a loadbalancer based on the status. With the current setup (on rolling update) on rollout a failing pod is added to a balancer causing request timeouts, because backend is not ready yet but esp tells that backend pod is ready. After a couple of seconds a pod is removed from a load balancer after esp does a couple of checks to a backend and change status from healthy to unhealthy.

Thanks!

TheSpy · 2022-11-12T21:02:33Z

UPDATE:
Seems like healthy state is replaced with unhealthy after 3 attempts to call backend health check endpoint.

D1112 21:01:19.167 28 external/envoy/source/common/network/connection_impl.cc:924] [28][connection][C40] connecting to 192.168.65.2:21411
D1112 21:01:19.167 28 external/envoy/source/common/network/connection_impl.cc:943] [28][connection][C40] connection in progress
D1112 21:01:20.584 28 external/envoy/source/extensions/network/dns_resolver/cares/dns_impl.cc:341] [28][dns]dns resolution for 169.254.169.254 started
D1112 21:01:20.655 28 external/envoy/source/extensions/network/dns_resolver/cares/dns_impl.cc:262] [28][dns]dns resolution for 169.254.169.254 completed with status 0
D1112 21:01:21.206 28 external/envoy/source/common/network/connection_impl.cc:695] [28][connection][C40] delayed connect error: 111
D1112 21:01:21.206 28 external/envoy/source/common/network/connection_impl.cc:250] [28][connection][C40] closing socket: 0
D1112 21:01:21.206 28 external/envoy/source/common/http/codec_client.cc:108] [28][client][C40] disconnect. resetting 1 pending requests
D1112 21:01:21.207 28 external/envoy/source/common/http/codec_client.cc:140] [28][client][C40] request reset
D1112 21:01:21.207 28 external/envoy/source/common/upstream/health_checker_impl.cc:787] [28][hc][C40] connection/stream error health_flags=/failed_active_hc

Would it be possible to change initial state to unhealthy?

nareddyt · 2022-11-14T15:44:55Z

@qiwzhang could you take a look?

qiwzhang · 2022-11-14T18:27:50Z

Sure

qiwzhang · 2022-11-15T01:32:34Z

ESPv2 is using envoy. Currently, it is not possible to change the initial_health_status to unhealthy.

I just checked its health check config, it seems that it is not supported.

qiwzhang self-assigned this Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Healthz returns incorrect status with health_check_grpc_backend active #751

Healthz returns incorrect status with health_check_grpc_backend active #751

TheSpy commented Nov 12, 2022 •

edited

Loading

TheSpy commented Nov 12, 2022

nareddyt commented Nov 14, 2022

qiwzhang commented Nov 14, 2022

qiwzhang commented Nov 15, 2022

Healthz returns incorrect status with health_check_grpc_backend active #751

Healthz returns incorrect status with health_check_grpc_backend active #751

Comments

TheSpy commented Nov 12, 2022 • edited Loading

TheSpy commented Nov 12, 2022

nareddyt commented Nov 14, 2022

qiwzhang commented Nov 14, 2022

qiwzhang commented Nov 15, 2022

TheSpy commented Nov 12, 2022 •

edited

Loading