TopoLVM can be installed by Helm Chart as described in Getting Started. This document describes how to install TopoLVM with advanced configurations.
You can configure the StorageClass created by the Helm Chart by editing the Helm Chart values.
fsType
specifies the filesystem type of the volume. Supported filesystems are ext4
, xfs
and btrfs
(beta).
volumeBindingMode
can be either WaitForFirstConsumer
or Immediate
.
WaitForFirstConsumer
is recommended because TopoLVM cannot schedule pods
wisely if volumeBindingMode
is Immediate
.
allowVolumeExpansion
enables expanding volumes.
additionalParameters
defines additional parameters for the StorageClass.
You can use it to set device-class
that the StorageClass will use.
The device-class
is described in the LVMd document.
reclaimPolicy
can be either Delete
or Retain
.
If you delete a PVC whose corresponding PV has Retain
reclaim policy, the corresponding LogicalVolume
resource and the LVM logical volume are NOT deleted. If you delete this LogicalVolume
resource after deleting the PVC, the related LVM logical volume is also deleted.
Pods using TopoLVM should always be prioritized over other normal pods. This is because TopoLVM pods can only be scheduled to a single node where its volumes exist whereas normal pods can be run on any node.
The Helm Chart create a PriorityClass by default.
You can configure its priority value by editing priorityClass.value
.
The PriorityClass is not used by default.
To apply it to pods, you need to specify the PriorityClass name in priorityClassName
field of the pod spec as follows.
apiVersion: v1
kind: Pod
metadata:
name: foo
spec:
priorityClassName: topolvm
...
LVMd is a component to manage LVM. There are three options for managing LVMd: using dedicated DaemonSet, embedding LVMd in topolvm-node, or using systemd. Below are the details of each of these options.
In general, it is recommended to run LVMd as the first or the second option.
Since these modes uses nsenter
to run lvm
related commands as a host process, it is necessary to enable hostPID
and privileged mode for the Pods.
This can not be achieved by some reason or on a environment (e.g. kind).
In this case, you can choose to run as a systemd service.
The Helm Chart runs a LVMd as a dedicated Daemonset by default. If you want to configure LVMd, you can edit the Helm Chart values.
lvmd:
managed: true
socketName: /run/topolvm/lvmd.sock
deviceClasses:
- name: ssd
volume-group: myvg1 # Change this value to your VG name.
default: true
spare-gb: 10
Note
If you are using a read-only filesystem, or /etc/lvm
is mounted read-only, LVMd will likely fail to create volumes with status code 5.
To avoid this, you need to set an extra environment variable.
lvmd:
env:
- name: LVM_SYSTEM_DIR
value: /tmp
This is in the very early stage, so be careful to use it.
In this mode, LMVd runs as a embed function in topolvm-node
container.
Thanks to lower consumption of resources, it is also suitable for edge computing or the IoT.
To use the mode, you need to set the Helm Chart values as follows:
lvmd:
managed: false
node:
lvmdEmbedded: true
Before setup, you need to get LMVd binary.
We provide pre-built binaries in the releases page for x86 architecture.
If you use other architecture or want to build it from source code, you can build it by mkdir build; go build -o build/lvmd ./pkg/lvmd
.
To setup LVMd as a systemd service:
- Place lvmd.yaml in
/etc/topolvm/lvmd.yaml
. If you want to specify thedevice-class
settings to use multiple volume groups, edit the file. See lvmd.md for details. - Install LVMd binary in
/opt/sbin/lvmd
andlvmd.service
in/etc/systemd/system
, then start the service.
This section describes how to switch to DaemonSet LVMd from LVMd running as a systemd service.
-
Install Helm Chart by configuring LVMd to act as a DaemonSet. You need to set the temporal
socket-name
which is not the same as the value in LVMd running as a systemd service. After the installation Helm Chart, DaemonSet LVMd and LVMd running as a systemd service exist at the same time using different sockets.<snip> lvmd: managed: true socketName: /run/topolvm/lvmd.sock # Change this value to something like `/run/topolvm/lvmd-work.sock`. deviceClasses: - name: ssd volume-group: myvg1 default: true spare-gb: 10 <snip>
-
Change the options of topolvm-node to communicate with the DaemonSet LVMd instead of LVMd running as a systemd service. You should set the temporal socket name which is not the same as in LVMd running as a systemd service.
<snip> node: lvmdSocket: /run/lvmd/lvmd.sock # Change this value to to something like `/run/lvmd/lvmd-work.sock`. <snip>
-
Check if you can create Pod/PVC and can access to existing PV.
-
Stop and remove LVMd running as a systemd service.
-
Change the
socket-name
and--lvmd-socket
options to the original one. To reflect the changes of ConfigMap, restart DamonSet LVMd manually.<snip> lvmd: socketName: /run/topolvm/lvmd-work.sock # Change this value to something like `/run/topolvm/lvmd.sock`. <snip> node: lvmdSocket: /run/lvmd/lvmd.sock # Change this value to something like `/run/lvmd/lvmd.sock`. <snip>
TopoLVM uses webhooks and its requires TLS certificates. The default method is using cert-manager described in Getting Started.
If you don't want to use cert-manager, you can use your own certificates as follows:
-
Prepare PEM encoded self-signed certificate and key files.
The certificate must be valid for hostname liketopolvm-controller.topolvm-system.svc
. -
Base64-encode the CA cert (in its PEM format)
-
Create Secret in
topolvm-system
namespace as follows:kubectl -n topolvm-system create secret tls topolvm-mutatingwebhook \ --cert=<CERTIFICATE FILE> --key=<KEY FILE>
-
Specify the certificate in the Helm Chart values.
<snip> webhook: caBundle: ... # Base64-encoded, PEM-encoded CA certificate that signs the server certificate <snip>
It is necessary to configure kube-scheduler
to schedule pods to appropriate nodes which have sufficient capacity to create volumes because TopoLVM provides node local volumes.
There are two options, using Storage Capacity Tracking feature or using topolvm-scheduler
.
The former is the default option and it is easy to setup. The latter is complicated, but it can prioritize nodes by the free disk capacity which can not be achieved by Storage Capacity Tracking.
It is built-in feature of Kubernetes and the Helm Chart uses it by default, so you don't need to do anything.
You can see the limitations of using Storage Capacity Tracking from here.
topolvm-scheduler is a scheduler extender for kube-scheduler
.
It must be deployed to where kube-scheduler
can connect.
If your kube-scheduler
can't connect to pods directly, you need to run topolvm-scheduler
as a DaemonSet on the nodes running kube-scheduler
to ensure that kube-scheduler
can connect to topolvm-scheduler
via a loopback network device.
Otherwise, you can run topolvm-scheduler
as a Deployment and create a Service to connect to it from kube-scheduler
.
To use topolvm-scheduler
, you need to enable it in the Helm Chart values.
scheduler:
enabled: false
# you can set the type to `daemonset` or `deployment`. The default is `daemonset`.
type: daemonset
controller:
storageCapacityTracking:
enabled: false
webhook:
podMutatingWebhook:
enabled: true
kube-scheduler
needs to be configured to use topolvm-scheduler
extender.
To configure kube-scheduler
, copy the scheduler-config.yaml to the hosts where kube-scheduler
s run.
If you are using topolvm-scheduler
as a Deployment, you need to edit the urlPrefix
in the file to specify the LoadBalancer address.
If you are installing your cluster from scratch with kubeadm
, you can use the following configuration:
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
metadata:
name: config
kubernetesVersion: v1.30.2
scheduler:
extraVolumes:
- name: "config"
hostPath: /path/to/scheduler-config # absolute path to the directory containing scheduler-config.yaml
mountPath: /var/lib/scheduler
readOnly: true
extraArgs:
config: /var/lib/scheduler/scheduler-config.yaml
To configure kube-scheduler
installed by kubeadm
, you need to edit the /etc/kubernetes/manifests/kube-scheduler.yaml
as follows:
-
Add a line to the
command
arguments array such as- --config=/var/lib/scheduler/scheduler-config.yaml
. Note that this is the location of the file after it is mapped to thekube-scheduler
container, not where it exists on the node local filesystem. -
Add a volume mapping to the location of the configuration on your node:
spec.volumes: - hostPath: path: /path/to/scheduler-config # absolute path to the directory containing scheduler-config.yaml type: Directory name: topolvm-config
-
Add a
volumeMount
for the scheduler container:spec.containers.volumeMounts: - mountPath: /var/lib/scheduler name: topolvm-config readOnly: true
The node scoring for pod scheduling can be fine-tuned with the following two ways:
- Adjust
divisor
parameter fortopolvm-scheduler
- Change the weight for the node scoring against the default by
kube-scheduler
The first method is to tune calculation of the node scoring by topolvm-scheduler
itself.
To adjust the parameter, you can set the Helm Chart value scheduler.schedulerOptions
.
The parameter detail is described in topolvm-scheduler.
The second method is to change the weight of the node score from topolvm-scheduler
.
The weight can be passed to kube-scheduler
via scheduler-config.yaml.
Almost all scoring algorithms in kube-scheduler
are weighted as "weight": 1
.
So if you want to give a priority to the scoring by topolvm-scheduler
, you have to set the weight as a value larger than one like as follows:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
...
extenders:
- urlPrefix: "http://127.0.0.1:9251"
filterVerb: "predicate"
prioritizeVerb: "prioritize"
nodeCacheCapable: false
weight: 100 ## EDIT THIS FIELD ##
managedResources:
- name: "topolvm.io/capacity"
ignoredByScheduler: true