Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anyio.BusyResourceError on port forward #543

Closed
slevang opened this issue Jan 6, 2025 · 3 comments · Fixed by #546
Closed

anyio.BusyResourceError on port forward #543

slevang opened this issue Jan 6, 2025 · 3 comments · Fixed by #546
Labels
bug Something isn't working

Comments

@slevang
Copy link

slevang commented Jan 6, 2025

Which project are you reporting a bug for?

kr8s

What happened?

xref dask/dask-kubernetes#926

I've been seeing this anyio.BusyResourceError frequently when connected to the scheduler of a large dask cluster on GKE (which uses a kr8s port forward). By frequently I mean it hits my workload within 10 minutes to an hour. So hopefully this is something we can just handle and retry when writing to the websocket of a port forward.

Anything else?

Traceback:

Unhandled exception in client_connected_cb
transport: <_SelectorSocketTransport closed fd=53>
Traceback (most recent call last):
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 737, in __aexit__
    cb_suppress = await cb(*exc_details)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 642, in _exit_wrapper
    await callback(*args, **kwds)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 966, in close
    await self.stream.write(data)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_async/http11.py", line 372, in write
    await self._stream.write(buffer, timeout)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 52, in write
    await self._stream.send(item=buffer)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 211, in send
    await self._call_sslobject_method(self._ssl_object.write, item)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 177, in _call_sslobject_method
    await self.transport_stream.send(self._write_bio.read())
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 1256, in send
    await AsyncIOBackend.checkpoint()
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2300, in checkpoint
    await sleep(0)
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/asyncio/tasks.py", line 656, in sleep
    await __sleep0()
  File "/home/slevang/miniconda3/envs/salient/lib/python3.12/asyncio/tasks.py", line 650, in __sleep0
    yield
asyncio.exceptions.CancelledError: Cancelled by cancel scope 76c7c470acc0

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/kr8s/_portforward.py", line 226, in _sync_sockets
  |     async with self._connect_websocket() as ws:
  |                ^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
  |     await self.gen.athrow(value)
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/kr8s/_portforward.py", line 204, in _connect_websocket
  |     async with self.pod.api.open_websocket(
  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
  |     await self.gen.athrow(value)
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/kr8s/_api.py", line 231, in open_websocket
  |     async with httpx_ws.aconnect_ws(
  |                ^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
  |     await self.gen.athrow(value)
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 1308, in aconnect_ws
  |     async with _aconnect_ws(
  |                ^^^^^^^^^^^^^
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
  |     await self.gen.athrow(value)
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 1211, in _aconnect_ws
  |     async with AsyncWebSocketSession(
  |                ^^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 641, in __aexit__
  |     await self._exit_stack.aclose()
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 696, in aclose
  |     await self.__aexit__(None, None, None)
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 754, in __aexit__
  |     raise exc_details[1]
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 737, in __aexit__
  |     cb_suppress = await cb(*exc_details)
  |                   ^^^^^^^^^^^^^^^^^^^^^^
  |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 763, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 1029, in _background_keepalive_ping
    |     pong_callback = await self.ping()
    |                     ^^^^^^^^^^^^^^^^^
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 665, in ping
    |     await self.send(event)
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 692, in send
    |     await self.stream.write(data)
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_async/http11.py", line 372, in write
    |     await self._stream.write(buffer, timeout)
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 52, in write
    |     await self._stream.send(item=buffer)
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 211, in send
    |     await self._call_sslobject_method(self._ssl_object.write, item)
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 177, in _call_sslobject_method
    |     await self.transport_stream.send(self._write_bio.read())
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 1255, in send
    |     with self._send_guard:
    |          ^^^^^^^^^^^^^^^^
    |   File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_core/_synchronization.py", line 713, in __enter__
    |     raise BusyResourceError(self.action)
    | anyio.BusyResourceError: Another task is already writing to this resource
    +------------------------------------
@slevang slevang added the bug Something isn't working label Jan 6, 2025
@slevang
Copy link
Author

slevang commented Jan 7, 2025

So this is actually a conflict when the httpx_ws _background_keepalive_ping tries to write to the socket at the same time as we do, which is evident if you follow the traceback above. I was able to get the errors to go away completely if I pass keepalive_ping_interval_seconds=None here, but that seems not ideal.

I'm not sure how to catch this and make it retriable since it can arise from either the internal httpx_ws write or the kr8s one. I tried adding a wrapper retry loop in _sync_sockets but couldn't get it to work.

@jacobtomlinson any ideas?

@slevang
Copy link
Author

slevang commented Jan 7, 2025

Ah, this was very recently solved in httpx-ws by adding a lock on the keepalive writes. Upgrading to 0.7.0 seems to solve my issues. Feel free to close unless you think there is something else that should be done in kr8s.

@jacobtomlinson
Copy link
Member

Ah that's great, thanks for investigating this. I think we should bump our minimum version of httpx-ws to 0.7.0 and then close this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants