UClamp

Introduction

UClamp is a scheduler mechanism which is present in mainline Linux, and serves as an effective replacement for the out-of-tree SchedTune mechanism, which is still widely used by many Android devices.

Very much like SchedTune, it controls the boosting of certain tasks in the system, but, it also controls the amount of utilization a specific task can request, also called capping of a certain task. This allows for much greater control over the resources of a system.

UClamp allows to control the utilization of the tasks in the system as a whole, or the tasks that are present in separate cgroups as well, within the cpuset^[1].

Global knobs

There are three uclamp knobs that influence all the tasks in the root cgroup, or all the tasks in the system in general (listed in /dev/cpuset/tasks):

sched_util_clamp_min
sched_util_clamp_max
sched_util_clamp_min_rt_default

The possible values these three knobs can accommodate is in the range of [0-1024].

sched_util_clamp_min:

This knob controls the minimum utilization a task can request.

On crossing the limit specified in this tunable, the scheduler ramps up to accommodate the increased utilization.

sched_util_clamp_max:

This knob controls the maximum utilization a task can request.

Uclamp restricts the increasing utilization requests, and does not allow the scheduler to ramp up beyond
the utilization specified in this tunable.

sched_util_clamp_min_rt_default:

This knob controls the utilization of real-time (RT) tasks.

Summarizing from the documentation^[2], RT tasks normally run at high priority/frequency and on CPUs with
higher capacities.This tunable allows to control this behavior and set a limit to the utilization RT tasks can request.

This knob is directly influenced by sched_uclamp_min, and quoting the documentation, it does not escape the
range constraint imposed by sched_util_clamp_min.

For example,
If:

sched_util_clamp_min_rt_default = 800
sched_util_clamp_min = 600

then effectively:
sched_util_clamp_min_rt_default = 600

Cgroup-specific knobs

Now, there are knobs for each cgroup in the cpusets as well (top-app/, foreground/, background/, etc).
These knobs only influence the tasks in that cgroup.They are:

uclamp.min
uclamp.max
uclamp.latency_sensitive
uclamp.boosted

uclamp.min and uclamp.max have the same behaviours as sched_util_clamp_min and sched_util_clamp_max respectively.
The possible values that these knobs accommodate are percentage values (in the range 0-100), unlike the sched_util_clamp_{min, max} knobs.

Also, writing max is synonymous to writing 100 to either of these knobs.

For example, writing:
echo max > /dev/cpuset/top-app/uclamp.max

is synonymous to writing:
echo 100 > /dev/cpuset/top-app/uclamp.max

uclamp.latency_sensitive is similar to SchedTune's .prefer_idle knob, where it specifies whether to allow tasks to use idle cores as well.
Writing 1 to it enables this feature, and 0 disables it.

uclamp.boosted^[3] is similar to SchedTune's .boost knob, where it specifies which tasks to boost.
Writing 1 to it enables this feature, and 0 disables it.

Notes:

[1]

By default, UClamp places the uclamp.{max, min, latency_sensitive} knobs in /dev/cpuctl.

Since the knobs have the CFTYPE_NOT_ON_ROOT flag, and the relevant control groups aren't present,
the knobs will not be created.
Therefore, as of the commit (sched/uclamp: Move all tunables to cpusets), all the relevant
knobs have been moved to the cpuset code.

[2]

Link to the docs:
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysctl/kernel.rst#sched-util-clamp-min
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysctl/kernel.rst#sched-util-clamp-max
https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/sysctl/kernel.rst#sched-util-clamp-min-rt-default

[3]

This out-of-tree knob was added in this commit (sched/uclamp: Make uclamp_boosted() return proper boosted value)
because by default, uclamp boosts tasks if sched_util_clamp_min is non-zero.
Therefore, it will boost all tasks if sched_util_clamp_min is set to a non-zero value, which isn't appropriate.
I modified it to be manually adjustable for each cgroup, so that we can appropriately boost tasks as required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly