Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IO might be hanging in batch completion storage system. #1466

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions HOWTO.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2890,6 +2890,19 @@ I/O depth
if none of I/O has been completed yet, we will NOT wait and immediately exit
the system call. In this example we simply do polling.

.. option:: iodepth_batch_complete_omit=int

The number of in-flight IOs that need not be retrieved on quiescing. It
defaults to 0 which means all in-flight IOs will be retrieved until completion
as not to skew the latency. After that, if in the rate-limiting context, the
next IO will be set after a fixed delay. But in some cases, the in-flight IOs
may need be hanging for a while to merge more subsequent IOs in order to get
a better bandwidth or reach the EC stripe length, and so on. And the hanging
time may delay the next IO for long if all in-flight IOs must be retrieved.
Therefore, this option will be used to skip quiescing partially(IOs in merge
waiting) for a smooth data flow, and as such, it is not suitable for the
latency-sensitive scenarios.

.. option:: iodepth_low=int

The low water mark indicating when to start filling the queue
Expand Down
2 changes: 2 additions & 0 deletions cconv.c
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
o->iodepth_batch = le32_to_cpu(top->iodepth_batch);
o->iodepth_batch_complete_min = le32_to_cpu(top->iodepth_batch_complete_min);
o->iodepth_batch_complete_max = le32_to_cpu(top->iodepth_batch_complete_max);
o->iodepth_batch_complete_omit = le32_to_cpu(top->iodepth_batch_complete_omit);
o->serialize_overlap = le32_to_cpu(top->serialize_overlap);
o->size = le64_to_cpu(top->size);
o->io_size = le64_to_cpu(top->io_size);
Expand Down Expand Up @@ -379,6 +380,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
top->iodepth_batch = cpu_to_le32(o->iodepth_batch);
top->iodepth_batch_complete_min = cpu_to_le32(o->iodepth_batch_complete_min);
top->iodepth_batch_complete_max = cpu_to_le32(o->iodepth_batch_complete_max);
top->iodepth_batch_complete_omit = cpu_to_le32(o->iodepth_batch_complete_omit);
top->serialize_overlap = cpu_to_le32(o->serialize_overlap);
top->size_percent = cpu_to_le32(o->size_percent);
top->io_size_percent = cpu_to_le32(o->io_size_percent);
Expand Down
12 changes: 12 additions & 0 deletions fio.1
Original file line number Diff line number Diff line change
Expand Up @@ -2650,6 +2650,18 @@ if none of I/O has been completed yet, we will NOT wait and immediately exit
the system call. In this example we simply do polling.
.RE
.TP
.BI iodepth_batch_complete_omit \fR=\fPint
The number of in-flight IOs that need not be retrieved on quiescing. It
defaults to 0 which means all in-flight IOs will be retrieved until completion
as not to skew the latency. After that, if in the rate-limiting context, the
next IO will be set after a fixed delay. But in some cases, the in-flight IOs
may need be hanging for a while to merge more subsequent IOs in order to get
a better bandwidth or reach the EC stripe length, and so on. And the hanging
time may delay the next IO for long if all in-flight IOs must be retrieved.
Therefore, this option will be used to skip quiescing partially(IOs in merge
waiting) for a smooth data flow, and as such, it is not suitable for the
latency-sensitive scenarios.
.TP
.BI iodepth_low \fR=\fPint
The low water mark indicating when to start filling the queue
again. Defaults to the same as \fBiodepth\fR, meaning that fio will
Expand Down
2 changes: 1 addition & 1 deletion io_u.c
Original file line number Diff line number Diff line change
Expand Up @@ -622,7 +622,7 @@ int io_u_quiesce(struct thread_data *td)
if (td->io_u_queued || td->cur_depth)
td_io_commit(td);

while (td->io_u_in_flight) {
while (td->io_u_in_flight > td->o.iodepth_batch_complete_omit) {
ret = io_u_queued_complete(td, 1);
if (ret > 0)
completed += ret;
Expand Down
14 changes: 14 additions & 0 deletions options.c
Original file line number Diff line number Diff line change
Expand Up @@ -2206,6 +2206,20 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
.category = FIO_OPT_C_IO,
.group = FIO_OPT_G_IO_BASIC,
},
{
.name = "iodepth_batch_complete_omit",
.lname = "Omit IO depth batch complete",
.type = FIO_OPT_INT,
.off1 = offsetof(struct thread_options, iodepth_batch_complete_omit),
.help = "The number of in-flight IOs that need not be retrieved",
.parent = "iodepth",
.hide = 1,
.minval = 0,
.interval = 1,
.def = "0",
.category = FIO_OPT_C_IO,
.group = FIO_OPT_G_IO_BASIC,
},
{
.name = "iodepth_low",
.lname = "IO Depth batch low",
Expand Down
3 changes: 2 additions & 1 deletion thread_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ struct thread_options {
unsigned int iodepth_batch;
unsigned int iodepth_batch_complete_min;
unsigned int iodepth_batch_complete_max;
unsigned int iodepth_batch_complete_omit;
unsigned int serialize_overlap;

unsigned int unique_filename;
Expand Down Expand Up @@ -414,8 +415,8 @@ struct thread_options_pack {
uint32_t iodepth_batch;
uint32_t iodepth_batch_complete_min;
uint32_t iodepth_batch_complete_max;
uint32_t iodepth_batch_complete_omit;
uint32_t serialize_overlap;
uint32_t pad;

uint64_t size;
uint64_t io_size;
Expand Down