-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fair_queue: make the fair_group token grabbing discipline more fair #2616
Merged
+352
−61
Merged
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
6a33387
apps/io_tester: add some test cases for the IO scheduler
michoecho 1e597cd
test: in fair_queue_test, ensure that tokens are only replenished by …
michoecho 7781c4c
fair_queue: track the total capacity of queued requests
michoecho 2330929
fair_queue: make the fair_group token grabbing discipline more fair
michoecho File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
39 changes: 39 additions & 0 deletions
39
tests/manual/iosched_reproducers/one_cpu_starved_shard_can_still_saturate_io.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Test scenario: | ||
# A single CPU-starved shard has a batch IO job. | ||
# Goal: it should be able to utilize the entire bandwidth of the disk, | ||
# despite the rare polls. | ||
|
||
if [ $# -ne 1 ]; then | ||
echo "Usage: $0 IO_TESTER_EXECUTABLE" >&2 | ||
exit 1 | ||
fi | ||
|
||
"$1" --smp=7 --storage=/dev/null --conf=<(cat <<'EOF' | ||
- name: tablet-streaming | ||
data_size: 1GB | ||
shards: [0] | ||
type: seqread | ||
shard_info: | ||
parallelism: 50 | ||
reqsize: 128kB | ||
shares: 200 | ||
- name: cpuhog | ||
type: cpu | ||
shards: [0] | ||
shard_info: | ||
parallelism: 1 | ||
execution_time: 550us | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, almost, since this is above the task quota, but not by much. |
||
|
||
EOF | ||
) --io-properties-file=<(cat <<'EOF' | ||
# i4i.2xlarge | ||
disks: | ||
- mountpoint: /dev | ||
read_bandwidth: 1542559872 | ||
read_iops: 218786 | ||
write_bandwidth: 1130867072 | ||
write_iops: 121499 | ||
EOF | ||
) |
39 changes: 39 additions & 0 deletions
39
tests/manual/iosched_reproducers/one_cpu_starved_shard_has_reasonable_fairness.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Test scenario: | ||
# all shards contend for IO, but one shard is additionally CPU-starved | ||
# and polls rarely. | ||
# Goal: it should still be getting a reasonably fair share of disk bandwidth. | ||
|
||
if [ $# -ne 1 ]; then | ||
echo "Usage: $0 IO_TESTER_EXECUTABLE" >&2 | ||
exit 1 | ||
fi | ||
|
||
"$1" --smp=7 --storage=/dev/null --conf=<(cat <<'EOF' | ||
- name: tablet-streaming | ||
data_size: 1GB | ||
shards: all | ||
type: seqread | ||
shard_info: | ||
parallelism: 50 | ||
reqsize: 128kB | ||
shares: 200 | ||
- name: cpuhog | ||
type: cpu | ||
shards: [0] | ||
shard_info: | ||
parallelism: 1 | ||
execution_time: 550us | ||
|
||
EOF | ||
) --io-properties-file=<(cat <<'EOF' | ||
# i4i.2xlarge | ||
disks: | ||
- mountpoint: /dev | ||
read_bandwidth: 1542559872 | ||
read_iops: 218786 | ||
write_bandwidth: 1130867072 | ||
write_iops: 121499 | ||
EOF | ||
) --duration=2 |
77 changes: 77 additions & 0 deletions
77
tests/manual/iosched_reproducers/scylla_tablet_migration.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
#!/usr/bin/env bash | ||
|
||
# Test scenario: | ||
# Simulation of a ScyllaDB workload which prompted some changes to the IO scheduler: | ||
# database queries concurrent with tablet streaming. | ||
# | ||
# All 7 shards are running a low-priority (200 shares) batch IO workload | ||
# and a high-priority (1000 shares), moderate-bandwidth, interactive workload. | ||
# | ||
# The interactive workload requires about 30% of the node's | ||
# total bandwidth (as measured in tokens), in small random reads. | ||
# The batch workload does large sequential reads and wants to utilize all | ||
# spare bandwidth. | ||
# | ||
# This workload is almost symmetric across shards, but is slightly skewed | ||
# and shard 0 is slightly more loaded. But even on this shard, the workload | ||
# doesn't need more than 35% of the fair bandwidth of this shard. | ||
# | ||
# Due to the distribution of shares across IO classes, the user expects that | ||
# the interactive workload should be guaranteed (1000 / (1000 + 200)) == ~84% of | ||
# the disk bandwidth on each shard. So if it's only asking for less than 35%, | ||
# the lower-priority job shouldn't disturb it. | ||
# | ||
# But before the relevant IO scheduler changes, this goal wasn't met, | ||
# and the interactive workload on shard 0 was instead starved for IO | ||
# by the low-priority workloads on other shards. | ||
|
||
if [ $# -ne 1 ]; then | ||
echo "Usage: $0 IO_TESTER_EXECUTABLE" >&2 | ||
exit 1 | ||
fi | ||
|
||
"$1" --smp=7 --storage=/dev/null --conf=<(cat <<'EOF' | ||
- name: tablet-streaming | ||
data_size: 1GB | ||
shards: all | ||
type: seqread | ||
shard_info: | ||
parallelism: 50 | ||
reqsize: 128kB | ||
shares: 200 | ||
- name: cassandra-stress | ||
shards: all | ||
type: randread | ||
data_size: 1GB | ||
shard_info: | ||
parallelism: 100 | ||
reqsize: 1536 | ||
shares: 1000 | ||
rps: 75 | ||
options: | ||
pause_distribution: poisson | ||
sleep_type: steady | ||
- name: cassandra-stress-slight-imbalance | ||
shards: [0] | ||
type: randread | ||
data_size: 1GB | ||
shard_info: | ||
parallelism: 100 | ||
reqsize: 1536 | ||
class: cassandra-stress | ||
rps: 10 | ||
options: | ||
pause_distribution: poisson | ||
sleep_type: steady | ||
|
||
EOF | ||
) --io-properties-file=<(cat <<'EOF' | ||
# i4i.2xlarge | ||
disks: | ||
- mountpoint: /dev | ||
read_bandwidth: 1542559872 | ||
read_iops: 218786 | ||
write_bandwidth: 1130867072 | ||
write_iops: 121499 | ||
EOF | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
#!/usr/bin/env bash | ||
|
||
# There is a `tau` mechanism in `fair_queue` which lets newly-activated | ||
# IO classes to monopolize the shard's IO queue for a while. | ||
# | ||
# This isn't very useful and can result in major performance problems, | ||
# as this test illustrates. The `highprio` workload could have tail latency | ||
# of about 2 milliseconds, but the `bursty_lowprio` is allowed by `tau` to butt in | ||
# periodically and preempt `highprio` for ~30ms, bringing its tail latency | ||
# to that threshold. | ||
|
||
if [ $# -ne 1 ]; then | ||
echo "Usage: $0 IO_TESTER_EXECUTABLE" >&2 | ||
exit 1 | ||
fi | ||
|
||
"$1" --smp=7 --storage=/dev/null --conf=<(cat <<'EOF' | ||
- name: filler | ||
data_size: 1GB | ||
shards: all | ||
type: seqread | ||
shard_info: | ||
parallelism: 10 | ||
reqsize: 128kB | ||
shares: 10 | ||
- name: bursty_lowprio | ||
data_size: 1GB | ||
shards: all | ||
type: seqread | ||
shard_info: | ||
parallelism: 1 | ||
reqsize: 128kB | ||
shares: 100 | ||
batch: 50 | ||
rps: 8 | ||
- name: highprio | ||
shards: all | ||
type: randread | ||
data_size: 1GB | ||
shard_info: | ||
parallelism: 100 | ||
reqsize: 1536 | ||
shares: 1000 | ||
rps: 50 | ||
options: | ||
pause_distribution: poisson | ||
sleep_type: steady | ||
EOF | ||
) --io-properties-file=<(cat <<'EOF' | ||
# i4i.2xlarge | ||
disks: | ||
- mountpoint: /dev | ||
read_bandwidth: 1542559872 | ||
read_iops: 218786 | ||
write_bandwidth: 1130867072 | ||
write_iops: 121499 | ||
EOF | ||
) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really starved (IIUC). It's saturated but the I/O fiber will receive CPU every task quota, which is as much as it can expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"CPU-starved" here means that there is always more work for the CPU to do. The important part is that the shard only polls once per ~500us, not ~1us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that "starved" usually has another meaning, but I don't have a better word here. Saturated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The I/O fiber would be starved if it needed CPU but got it less than once per task quota.
The reactor (not the I/O fiber) is saturated: it doesn't have any cycles to spare, so it isn't able to provide better-than-expected service (CPU every 1usec). But it's not starving the I/O fiber, merely sticking to the expectations.
I'd read "reactor starved" to mean that something outside the reactor is eating the CPU.