Skip to content

Commit

Permalink
Add NFD rule for Gaudi resource driver (#69)
Browse files Browse the repository at this point in the history
* add nfd rule

Signed-off-by: Oksana Baranova <[email protected]>
  • Loading branch information
oxxenix authored Jan 4, 2025
1 parent 5326261 commit 48e51ae
Show file tree
Hide file tree
Showing 7 changed files with 54 additions and 9 deletions.
12 changes: 10 additions & 2 deletions charts/intel-gaudi-resource-driver/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,13 @@ name: intel-gaudi-resource-driver
description: A Helm chart for a Dynamic Resource Allocation (DRA) Intel Gaudi Resource Driver

type: application
version: 0.2.0
appVersion: "v0.2.0"
version: 0.3.0
appVersion: "v0.3.0"
home: https://github.com/intel/helm-charts

dependencies:
- name: node-feature-discovery
alias: nfd
version: "0.16.6"
condition: nfd.enabled
repository: https://kubernetes-sigs.github.io/node-feature-discovery/charts
6 changes: 4 additions & 2 deletions charts/intel-gaudi-resource-driver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ helm repo update
You can execute `helm search repo intel` command to see pulled charts [optional].

## Install Helm Chart
When installing, update the dependencies:
```
helm dependency update
helm install intel-gaudi-resource-driver intel/intel-gaudi-resource-driver
```
## Upgrade Chart
Expand All @@ -43,7 +45,7 @@ You may also run `helm show values` on this chart's dependencies for additional
| image.repository | string | `intel` |
| image.name | string | `"intel-gaudi-resource-driver"` |
| image.pullPolicy | string | `"IfNotPresent"` |
| image.tag | string | `"v0.2.0"` |
| image.tag | string | `"v0.3.0"` |

> [!Note]
> When upgrading, CRDs from previous version need to be removed manually because Helm supports neither upgrading nor deleting CRDs, see: https://github.com/helm/community/blob/main/hips/hip-0011.md
> If you change the image tag to be used in Helm chart deployment, ensure that the version of the container image is consistent with deployment YAMLs - they might change between releases.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
apiVersion: resource.k8s.io/v1alpha3
apiVersion: resource.k8s.io/v1beta1
kind: DeviceClass
metadata:
name: gaudi.intel.com
Expand Down
16 changes: 16 additions & 0 deletions charts/intel-gaudi-resource-driver/templates/nfd.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{{- if .Values.nfd.enabled }}
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: intel-gaudi-device-rule
spec:
rules:
- name: "intel.gaudi"
labels:
"intel.feature.node.kubernetes.io/gaudi": "true"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["1da3"]}
device: {op: In, value: ["1020", "1030"]}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,15 @@ spec:
tolerations:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- if .Values.nfd.enabled }}
nodeSelector:
intel.feature.node.kubernetes.io/gaudi: "true"
{{- else }}
{{- with .Values.kubeletPlugin.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- end }}
{{- with .Values.kubeletPlugin.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ spec:
matchConstraints:
resourceRules:
- apiGroups: ["resource.k8s.io"]
apiVersions: ["v1alpha3"]
apiVersions: ["v1beta1"]
operations: ["CREATE", "UPDATE", "DELETE"]
resources: ["resourceslices"]
matchConditions:
Expand Down
20 changes: 17 additions & 3 deletions charts/intel-gaudi-resource-driver/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ image:
repository: intel
name: intel-gaudi-resource-driver
pullPolicy: IfNotPresent
tag: "v0.2.0"
tag: "v0.3.0"

serviceAccount:
create: true
Expand All @@ -19,13 +19,27 @@ serviceAccount:

kubeletPlugin:
podAnnotations: {}
nodeSelector: {}
# label used when nfd.enabled is true
#intel.feature.node.kubernetes.io/gaudi: "true"
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
nodeSelector: {}
#node-role.kubernetes.io/control-plane: ""
# Refer to the official documentation for Node Feature Discovery (NFD)
# regarding node tainting:
# https://nfd.sigs.k8s.io/usage/customization-guide#node-tainting
- key: "intel.feature.node.kubernetes.io/gaudi"
operator: "Exists"
effect: "NoSchedule"
affinity: {}

nfd:
enabled: false # change to true to install NFD to the cluster
nameOverride: intel-gaudi-nfd
# TODO: this deprecated NFD option will be replaced in NFD v0.17 with "featureGates.NodeFeatureAPI" (added in v0.16):
# https://kubernetes-sigs.github.io/node-feature-discovery/v0.16/deployment/helm.html#general-parameters
enableNodeFeatureApi: true

0 comments on commit 48e51ae

Please sign in to comment.