-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out what the version of urllib3
should be
#4715
Comments
Bumping because the issue is still occurring, 310 times in last 24 hours, 72k times this year. This started Jan 4, seemingly after an update to urllib3 to 2.1.0, #3956 reverted to urllib3==1.26.18 and the issue was seemingly resolved. But now we're on 2.2.3, I guess because the version was not pinned and Dependabot has bumped it. Maybe we can just try reverting urllib again and pinning it. That would be quick to try. Or maybe upgrading opentelemetry (somewhere?) would play well with upgraded urllib. Or maybe there is some bug in our export of telemetry (to Honeycomb?) that the upgrade has revealed and we can resolve it. |
All the relevant dependencies were updated in #4804. This didn't break anything in Honeycomb, but hasn't resolved the error we get, since we still see it in Sentry. |
As far as I can tell, the whole stack here is:
I'd add that we don't get any such problem for OpenCodelists, which has a similar stack. job-server does have a more complicated OpenTelemetry configuration compared with OpenCodelists. OpenCodelists has a longer gunicorn timeout. |
This is the latest v1 release. See #4715. Hopefully this stops the Sentry issue described there from appearing again. This is a short-to-medium term sticking plaster as at some point we'll have to switch to v2.
Reverting back to urllib v1 did not stop these errors. It's odd because although the version of urllib v1 we switched to is slightly newer than what there was when things were supposedly working, it's not had much development: urllib3/urllib3@1.26.18...1.26.20 It's not possible to go back early enough in Sentry's event log to see the behaviour back when the issue first arose, unfortunately. Example full stack trace: Message
Exception while exporting Span batch.
Stack Trace
RemoteDisconnected: Remote end closed connection without response
File "urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "urllib3/connectionpool.py", line 468, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
# Permission is hereby granted, free of charge, to any person obtaining a copy
File "urllib3/connectionpool.py", line 463, in _make_request
httplib_response = conn.getresponse()
File "http/client.py", line 1428, in getresponse
response.begin()
File "http/client.py", line 331, in begin
version, status, reason = self._read_status()
File "http/client.py", line 300, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
File "requests/adapters.py", line 667, in send
resp = conn.urlopen(
File "urllib3/connectionpool.py", line 802, in urlopen
retries = retries.increment(
File "urllib3/util/retry.py", line 552, in increment
raise six.reraise(type(error), error, _stacktrace)
File "urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "urllib3/connectionpool.py", line 716, in urlopen
httplib_response = self._make_request(
File "urllib3/connectionpool.py", line 468, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
# Permission is hereby granted, free of charge, to any person obtaining a copy
File "urllib3/connectionpool.py", line 463, in _make_request
httplib_response = conn.getresponse()
File "http/client.py", line 1428, in getresponse
response.begin()
File "http/client.py", line 331, in begin
version, status, reason = self._read_status()
File "http/client.py", line 300, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
File "/opt/venv/lib/python3.12/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 360, in _export_batch
self.span_exporter.export(self.spans_list[:idx]) # type: ignore
File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 189, in export
return self._export_serialized_spans(serialized_data)
File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 159, in _export_serialized_spans
resp = self._export(serialized_data)
File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 133, in _export
return self._session.post(
File "requests/sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
File "requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/opt/venv/lib/python3.12/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 180, in instrumented_send
return wrapped_send(self, request, **kwargs)
File "requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "requests/adapters.py", line 682, in send
raise ConnectionError(err, request=request) |
So the answer to the original question here is probably: we could switch back to the latest urllib v2, because we get the errors regardless. Why we get those errors is another issue. |
Thanks for digging into this, Steve. After Jan 16 when we switched back to urllib v1 with PR #4806 we have Sentry issues classified under JOB-SERVER-HC headed
I think we're basing the supposition that #3956 was a fix on the later messages in #3955 "No errors since Jan 9th." It's possible that, in fact, #3956 was not a fix and the error Sentry reported switched from JOB-SERVER-N9 to JOB-SERVER-HC, and this wasn't noticed. Unfortunately we can only look back six months in Sentry so we can't be sure. If that's the case I apologize for taking the text in 3955 too literally without further supporting evidence. I think we could benefit from recording a little more from Sentry, given that the data disappears, I'll raise it in planning or a retro.
I think that's right and we should have an separate Issue for resolving the root issue for the Sentry errors, although I don't know that we have any leads as to what that might be. |
Following closing of #4662, we should decide to either:
urllib3
urllib3
The original reason
urllib3
was pinned relates to an OpenTelemetry problem in #3955. That issue may or may not have been subsequently resolved by updates in the OpenTelemetry packages.We should check Sentry and see whether we are getting the same kind of errors as in #4662, and then decide what to do.
Note that while
urllib3
v1 still appears to be maintained (there was a release 2024-08-29), I'm not sure what the end-of-life date is for support. v2 has been out for around 18 months. Ideally, we would keep thev2
version.The text was updated successfully, but these errors were encountered: