Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out what the version of urllib3 should be #4715

Open
StevenMaude opened this issue Oct 30, 2024 · 6 comments
Open

Figure out what the version of urllib3 should be #4715

StevenMaude opened this issue Oct 30, 2024 · 6 comments
Assignees
Labels
deck-scrubbing Tech debt or other between-initiative tidy-up work

Comments

@StevenMaude
Copy link
Contributor

Following closing of #4662, we should decide to either:

  • make the requirements file less misleading by removing the pinning and the associated comment, and use the latest v2 version of urllib3
  • downgrade to the v1 of urllib3

The original reason urllib3 was pinned relates to an OpenTelemetry problem in #3955. That issue may or may not have been subsequently resolved by updates in the OpenTelemetry packages.

We should check Sentry and see whether we are getting the same kind of errors as in #4662, and then decide what to do.

Note that while urllib3 v1 still appears to be maintained (there was a release 2024-08-29), I'm not sure what the end-of-life date is for support. v2 has been out for around 18 months. Ideally, we would keep the v2 version.

@mikerkelly mikerkelly added the deck-scrubbing Tech debt or other between-initiative tidy-up work label Dec 6, 2024
@mikerkelly
Copy link
Contributor

mikerkelly commented Dec 6, 2024

Bumping because the issue is still occurring, 310 times in last 24 hours, 72k times this year.
#3955 suggests that perhaps this isn't leading to lost telemetry data but I'm not sure we're certain of that, and that would be problematic, particularly if it were systematically dropping some particular class(es) of data.

This started Jan 4, seemingly after an update to urllib3 to 2.1.0, #3956 reverted to urllib3==1.26.18 and the issue was seemingly resolved. But now we're on 2.2.3, I guess because the version was not pinned and Dependabot has bumped it.

Maybe we can just try reverting urllib again and pinning it. That would be quick to try. Or maybe upgrading opentelemetry (somewhere?) would play well with upgraded urllib. Or maybe there is some bug in our export of telemetry (to Honeycomb?) that the upgrade has revealed and we can resolve it.

@StevenMaude
Copy link
Contributor Author

All the relevant dependencies were updated in #4804.

This didn't break anything in Honeycomb, but hasn't resolved the error we get, since we still see it in Sentry.

@StevenMaude
Copy link
Contributor Author

StevenMaude commented Jan 14, 2025

As far as I can tell, the whole stack here is:

  • nginx
  • gunicorn
  • Python 3/Django
  • urllib3
  • requests
  • opentelemetry exporter/SDK

I'd add that we don't get any such problem for OpenCodelists, which has a similar stack. job-server does have a more complicated OpenTelemetry configuration compared with OpenCodelists.

OpenCodelists has a longer gunicorn timeout.

StevenMaude added a commit that referenced this issue Jan 14, 2025
This is the latest v1 release.

See #4715. Hopefully this stops the Sentry issue described there from
appearing again. This is a short-to-medium term sticking plaster as at
some point we'll have to switch to v2.
@StevenMaude
Copy link
Contributor Author

StevenMaude commented Jan 21, 2025

Reverting back to urllib v1 did not stop these errors. It's odd because although the version of urllib v1 we switched to is slightly newer than what there was when things were supposedly working, it's not had much development: urllib3/urllib3@1.26.18...1.26.20

It's not possible to go back early enough in Sentry's event log to see the behaviour back when the issue first arose, unfortunately.

Example full stack trace:

Message

Exception while exporting Span batch.

Stack Trace

RemoteDisconnected: Remote end closed connection without response
  File "urllib3/connectionpool.py", line 716, in urlopen
    httplib_response = self._make_request(
  File "urllib3/connectionpool.py", line 468, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
    # Permission is hereby granted, free of charge, to any person obtaining a copy
  File "urllib3/connectionpool.py", line 463, in _make_request
    httplib_response = conn.getresponse()
  File "http/client.py", line 1428, in getresponse
    response.begin()
  File "http/client.py", line 331, in begin
    version, status, reason = self._read_status()
  File "http/client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"

ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
  File "requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "urllib3/connectionpool.py", line 802, in urlopen
    retries = retries.increment(
  File "urllib3/util/retry.py", line 552, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "urllib3/packages/six.py", line 769, in reraise
    raise value.with_traceback(tb)
  File "urllib3/connectionpool.py", line 716, in urlopen
    httplib_response = self._make_request(
  File "urllib3/connectionpool.py", line 468, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
    # Permission is hereby granted, free of charge, to any person obtaining a copy
  File "urllib3/connectionpool.py", line 463, in _make_request
    httplib_response = conn.getresponse()
  File "http/client.py", line 1428, in getresponse
    response.begin()
  File "http/client.py", line 331, in begin
    version, status, reason = self._read_status()
  File "http/client.py", line 300, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
  File "/opt/venv/lib/python3.12/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 360, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
  File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 189, in export
    return self._export_serialized_spans(serialized_data)
  File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 159, in _export_serialized_spans
    resp = self._export(serialized_data)
  File "/opt/venv/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 133, in _export
    return self._session.post(
  File "requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/opt/venv/lib/python3.12/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 180, in instrumented_send
    return wrapped_send(self, request, **kwargs)
  File "requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "requests/adapters.py", line 682, in send
    raise ConnectionError(err, request=request)

@StevenMaude
Copy link
Contributor Author

So the answer to the original question here is probably: we could switch back to the latest urllib v2, because we get the errors regardless.

Why we get those errors is another issue.

@mikerkelly
Copy link
Contributor

Thanks for digging into this, Steve.

After Jan 16 when we switched back to urllib v1 with PR #4806 we have Sentry issues classified under JOB-SERVER-HC headed Exception while exporting Span batch. Since then we Sentry issues classified under JOB-SERVER-N9 headed ConnectionError requests.adapters in send. They appear to be the same underlying issue, raised by L682 of requests.adapters (1 2). I suppose the Exception wrangling by the package is slightly different so they look like different issues to Sentry. I can't see any other Sentry issues with similar messages (1 2). It seems switching between urllib3 v1 and v2 just changes how Sentry reports the issue and doesn't actually change the behaviour.

This started Jan 4, seemingly after an update to urllib3 to 2.1.0, #3956 reverted to urllib3==1.26.18 and the issue was seemingly resolved.

Reverting back to urllib v1 did not stop these errors. It's odd because although the version of urllib v1 we switched to is slightly newer than what there was when things were supposedly working, it's not had much development

I think we're basing the supposition that #3956 was a fix on the later messages in #3955 "No errors since Jan 9th." It's possible that, in fact, #3956 was not a fix and the error Sentry reported switched from JOB-SERVER-N9 to JOB-SERVER-HC, and this wasn't noticed. Unfortunately we can only look back six months in Sentry so we can't be sure.

If that's the case I apologize for taking the text in 3955 too literally without further supporting evidence. I think we could benefit from recording a little more from Sentry, given that the data disappears, I'll raise it in planning or a retro.

So the answer to the original question here is probably: we could switch back to the latest urllib v2, because we get the errors regardless. Why we get those errors is another issue.

I think that's right and we should have an separate Issue for resolving the root issue for the Sentry errors, although I don't know that we have any leads as to what that might be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deck-scrubbing Tech debt or other between-initiative tidy-up work
Projects
None yet
Development

No branches or pull requests

2 participants