You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been seeing this anyio.BusyResourceError frequently when connected to the scheduler of a large dask cluster on GKE (which uses a kr8s port forward). By frequently I mean it hits my workload within 10 minutes to an hour. So hopefully this is something we can just handle and retry when writing to the websocket of a port forward.
Anything else?
Traceback:
Unhandled exception in client_connected_cb
transport: <_SelectorSocketTransport closed fd=53>
Traceback (most recent call last):
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 737, in __aexit__
cb_suppress = await cb(*exc_details)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 642, in _exit_wrapper
await callback(*args, **kwds)
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 966, in close
await self.stream.write(data)
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_async/http11.py", line 372, in write
await self._stream.write(buffer, timeout)
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 52, in write
await self._stream.send(item=buffer)
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 211, in send
await self._call_sslobject_method(self._ssl_object.write, item)
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 177, in _call_sslobject_method
await self.transport_stream.send(self._write_bio.read())
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 1256, in send
await AsyncIOBackend.checkpoint()
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2300, in checkpoint
await sleep(0)
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/asyncio/tasks.py", line 656, in sleep
await __sleep0()
File "/home/slevang/miniconda3/envs/salient/lib/python3.12/asyncio/tasks.py", line 650, in __sleep0
yield
asyncio.exceptions.CancelledError: Cancelled by cancel scope 76c7c470acc0
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/kr8s/_portforward.py", line 226, in _sync_sockets
| async with self._connect_websocket() as ws:
| ^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
| await self.gen.athrow(value)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/kr8s/_portforward.py", line 204, in _connect_websocket
| async with self.pod.api.open_websocket(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
| await self.gen.athrow(value)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/kr8s/_api.py", line 231, in open_websocket
| async with httpx_ws.aconnect_ws(
| ^^^^^^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
| await self.gen.athrow(value)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 1308, in aconnect_ws
| async with _aconnect_ws(
| ^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 231, in __aexit__
| await self.gen.athrow(value)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 1211, in _aconnect_ws
| async with AsyncWebSocketSession(
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 641, in __aexit__
| await self._exit_stack.aclose()
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 696, in aclose
| await self.__aexit__(None, None, None)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 754, in __aexit__
| raise exc_details[1]
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/contextlib.py", line 737, in __aexit__
| cb_suppress = await cb(*exc_details)
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 763, in __aexit__
| raise BaseExceptionGroup(
| ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 1029, in _background_keepalive_ping
| pong_callback = await self.ping()
| ^^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 665, in ping
| await self.send(event)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpx_ws/_api.py", line 692, in send
| await self.stream.write(data)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_async/http11.py", line 372, in write
| await self._stream.write(buffer, timeout)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/httpcore/_backends/anyio.py", line 52, in write
| await self._stream.send(item=buffer)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 211, in send
| await self._call_sslobject_method(self._ssl_object.write, item)
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/streams/tls.py", line 177, in _call_sslobject_method
| await self.transport_stream.send(self._write_bio.read())
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 1255, in send
| with self._send_guard:
| ^^^^^^^^^^^^^^^^
| File "/home/slevang/miniconda3/envs/salient/lib/python3.12/site-packages/anyio/_core/_synchronization.py", line 713, in __enter__
| raise BusyResourceError(self.action)
| anyio.BusyResourceError: Another task is already writing to this resource
+------------------------------------
The text was updated successfully, but these errors were encountered:
So this is actually a conflict when the httpx_ws_background_keepalive_ping tries to write to the socket at the same time as we do, which is evident if you follow the traceback above. I was able to get the errors to go away completely if I pass keepalive_ping_interval_seconds=Nonehere, but that seems not ideal.
I'm not sure how to catch this and make it retriable since it can arise from either the internal httpx_ws write or the kr8s one. I tried adding a wrapper retry loop in _sync_sockets but couldn't get it to work.
Ah, this was very recently solved in httpx-ws by adding a lock on the keepalive writes. Upgrading to 0.7.0 seems to solve my issues. Feel free to close unless you think there is something else that should be done in kr8s.
Which project are you reporting a bug for?
kr8s
What happened?
xref dask/dask-kubernetes#926
I've been seeing this
anyio.BusyResourceError
frequently when connected to the scheduler of a large dask cluster on GKE (which uses a kr8s port forward). By frequently I mean it hits my workload within 10 minutes to an hour. So hopefully this is something we can just handle and retry when writing to the websocket of a port forward.Anything else?
Traceback:
The text was updated successfully, but these errors were encountered: