You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are encountering an issue in a 60‑node Kubernetes cluster where certain image layers cannot be pulled via Spegel, even though one of the nodes definitely has them and Spegel logs show those layers are being advertised. In contrast, on a 10‑node cluster (same Spegel configuration), pulling the same images works fine.
> crictl pull mylocalharbor.example.com/proxy-cache/library/nginx:alpine
E0225 21:59:01.973081 1665303 remote_image.go:167] "PullImage from image service failed" err="rpc error: code = NotFound desc = failed to pull and unpack image \"mylocalharbor.example.com/proxy-cache/library/nginx:alpine\": failed to copy: httpReadSeeker: failed open: content at http://localhost:20020/v2/proxy-cache/library/nginx/manifests/sha256:a71e0884a7f1192ecf5decf062b67d46b54ad63f0cc1b8aa7e705f739a97c2fc?ns=mylocalharbor.example.com not found: not found" image="mylocalharbor.example.com/proxy-cache/library/nginx:alpine"
FATA[0000] pulling image: rpc error: code = NotFound desc = failed to pull and unpack image "mylocalharbor.example.com/proxy-cache/library/nginx:alpine": failed to copy: httpReadSeeker: failed open: content at http://localhost:20020/v2/proxy-cache/library/nginx/manifests/sha256:a71e0884a7f1192ecf5decf062b67d46b54ad63f0cc1b8aa7e705f739a97c2fc?ns=mylocalharbor.example.com not found: not found
Spegel logs (Pull Node):
{"time":"2025-02-25T21:59:01.503680186Z","level":"INFO","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror","file":"/build/pkg/registry/registry.go","line":236},"msg":"handling mirror request from external node","key":"mylocalharbor.example.com/proxy-cache/library/nginx:alpine","path":"/v2/proxy-cache/library/nginx/manifests/alpine","ip":"100.96.27.160"}
{"time":"2025-02-25T21:59:01.508731705Z","level":"DEBUG","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handleMirror","file":"/build/pkg/registry/registry.go","line":287},"msg":"mirrored request","key":"mylocalharbor.example.com/proxy-cache/library/nginx:alpine","path":"/v2/proxy-cache/library/nginx/manifests/alpine","ip":"100.96.27.160","url":"http://100.96.17.54:5000"}
{"time":"2025-02-25T21:59:01.508980219Z","level":"INFO","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":132},"msg":"","path":"/v2/proxy-cache/library/nginx/manifests/alpine","status":200,"method":"HEAD","latency":"5.366138ms","ip":"100.96.27.160"}
...
{"time":"2025-02-25T21:59:01.572263309Z","level":"ERROR","source":{"function":"github.com/spegel-org/spegel/pkg/registry.(*Registry).handle.func1","file":"/build/pkg/registry/registry.go","line":135},"msg":"","err":"mirror resolve retries exhausted for key: sha256:a71e0884a7f1192ecf5decf062b67d46b54ad63f0cc1b8aa7e705f739a97c2fc","path":"/v2/proxy-cache/library/nginx/manifests/sha256:a71e0884a7f1192ecf5decf062b67d46b54ad63f0cc1b8aa7e705f739a97c2fc","status":404,"method":"GET","latency":"30.012755ms","ip":"100.96.27.160"}
Despite Spegel advertising the layer (sha256:a71e0884a7f1192ecf5decf062b67d46b54ad63f0cc1b8aa7e705f739a97c2fc) on one node, the pull operation times out or returns 404 in the bigger cluster. In a smaller (10‑node) cluster, the exact same config and images work perfectly.
Spegel successfully retrieves the manifest but fails to fetch a specific blob (layer or config), returning “not found” or “mirror resolve retries exhausted.”
The node that actually has the blob claims to advertise it (shown in logs).
Extended mirrorResolveTimeout and mirrorResolveRetries do not help.
Is there a known limitation or bug in how Spegel’s P2P resolves layers in larger clusters? Any suggestions or debugging tips would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered:
In a distributed system there can be many reasons for content to not be found. I also find libp2p KAD very difficult to debug as there are many moving components. They did just merge a PR that seems to be related to what you are seeing caused by the wrong context used in a query.
Spegel version
v0.0.30
Kubernetes distribution
vanilla
Kubernetes version
v1.28.4
CNI
Cilium v1.13.9
Describe the bug
We are encountering an issue in a 60‑node Kubernetes cluster where certain image layers cannot be pulled via Spegel, even though one of the nodes definitely has them and Spegel logs show those layers are being advertised. In contrast, on a 10‑node cluster (same Spegel configuration), pulling the same images works fine.
Spegel logs (Pull Node):
Spegel logs (Node with the blob):
Despite Spegel advertising the layer (sha256:a71e0884a7f1192ecf5decf062b67d46b54ad63f0cc1b8aa7e705f739a97c2fc) on one node, the pull operation times out or returns 404 in the bigger cluster. In a smaller (10‑node) cluster, the exact same config and images work perfectly.
Spegel Helm Values:
Observed behavior:
Is there a known limitation or bug in how Spegel’s P2P resolves layers in larger clusters? Any suggestions or debugging tips would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: