You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kepler seems to have problems with eBPF on my current setup. Kepler logs state:
failed to create eBPF exporter: error loading eBPF objects: field KeplerIrqTrace: program kepler_irq_trace: attach Tracing/TraceRawTp: raw_tp softirq_entry not supported
However softirq_entry is present at /sys/kernel/debug/ on the host. I did find the similar issue #727 which points to to a permission problem. Do I need to configure my host differently?
What did you expect to happen?
Installation succeeds.
How can we reproduce it (as minimally and precisely as possible)?
I0905 08:13:52.281482 1 gpu.go:38] Trying to initialize GPU collector using dcgm
W0905 08:13:52.281702 1 gpu_dcgm.go:104] There is no DCGM daemon running in the host: libdcgm.so not Found
W0905 08:13:52.281727 1 gpu_dcgm.go:108] Could not start DCGM. Error: libdcgm.so not Found
I0905 08:13:52.281733 1 gpu.go:45] Error initializing dcgm: not able to connect to DCGM: libdcgm.so not Found
I0905 08:13:52.281739 1 gpu.go:38] Trying to initialize GPU collector using nvidia-nvml
I0905 08:13:52.281789 1 gpu.go:45] Error initializing nvidia-nvml: failed to init nvml. ERROR_LIBRARY_NOT_FOUND
I0905 08:13:52.281798 1 gpu.go:38] Trying to initialize GPU collector using dummy
I0905 08:13:52.281803 1 gpu.go:42] Using dummy to obtain gpu power
I0905 08:13:52.285110 1 exporter.go:100] Kepler running on version: v0.7.11
I0905 08:13:52.285158 1 config.go:284] using gCgroup ID in the BPF program: true
I0905 08:13:52.285182 1 config.go:286] kernel version: 5.4
I0905 08:13:52.285247 1 config.go:311] The Idle power will be exposed. Are you running on Baremetal or using single VM per node?
I0905 08:13:52.285302 1 power.go:53] use sysfs to obtain power
I0905 08:13:52.285315 1 redfish.go:169] failed to get redfish credential file path
I0905 08:13:52.289657 1 power.go:73] using acpi to obtain power
I0905 08:13:52.292851 1 exporter.go:89] Number of CPUs: 16
F0905 08:13:52.412014 1 exporter.go:140] failed to create eBPF exporter: error loading eBPF objects: field KeplerIrqTrace: program kepler_irq_trace: attach Tracing/TraceRawTp: raw_tp softirq_entry not supported
So I installed a newer version of the kernel and this fixed the issue. I think the minimum kernel requirements in the docs should be updated (or maybe I overlooked something?). I'd be happy to do this. Where do you think this should be stated best and what version is the minimum based on the eBPF features used?
What happened?
Kepler seems to have problems with eBPF on my current setup. Kepler logs state:
However
softirq_entry
is present at/sys/kernel/debug/
on the host. I did find the similar issue #727 which points to to a permission problem. Do I need to configure my host differently?What did you expect to happen?
Installation succeeds.
How can we reproduce it (as minimally and precisely as possible)?
helm install kepler kepler/kepler --namespace kepler --create-namespace
Anything else we need to know?
OS: Ubuntu 20.04.3 LTS x86_64
Host: SYS-1019GP-TT 0123456789
Kernel: 5.4.0-192-generic
CPU: Intel Xeon Silver 4208 (16) @ 3.200GHz
GPU: NVIDIA Quadro RTX 5000
GPU: NVIDIA Quadro RTX 5000
Memory: 95208MiB
Kepler image tag
Kubernetes version
Cloud provider or bare metal
Bare Meal
OS version
Install tools
helm according to the docs with default values
Kepler deployment config
Container runtime (CRI) and version (if applicable)
Containerd v1.7.20-k3s1
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: