Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a :queue to store modules in ExUnit.Server #13636

Merged
merged 3 commits into from
Jun 5, 2024

Conversation

SteffenDE
Copy link
Contributor

In order to have more deterministic test runs when using --max-cases 1 and --max-requires 1 (#13635) (see also #13589), we need to run tests in compilation order (FIFO).

In the past, ExUnit.Server appended new tests to the front of a list, which would result in the most recently added test to be run first.

Let's quickly demonstrate the problem this causes for deterministic runs with a simple example:

Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as async tests execute in parallel with compilation of other test files, we could have the following scenario:

FooTest is compiled and because it's async it is immediately started. It takes 1 second to run.
In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest.
The order of test runs now is:

FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run.
While FooTest runs, more than two other tests are compiled. The order of test runs is:

FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

This can be fixed either by appending new test modules to the end of the list, or - and that's what this commit does - by using a :queue instead.

In order to have more deterministic test runs when using `--max-cases 1`
and `--max-requires 1` (elixir-lang#13635)
(see also elixir-lang#13589), we need to
run tests in compilation order (FIFO).

In the past, ExUnit.Server appended new tests to the front of a list,
which would result in the most recently added test to be run first.

Let's quickly demonstrate the problem this causes for deterministic runs
with a simple example:

Imagine a test (let's call if FooTest) that takes a non-deterministic
amount of time to run. For now let's assume that it sometimes takes 1
second and sometimes up to 5. And as async tests execute in parallel with
compilation of other test files, we could have the following scenario:

FooTest is compiled and because it's async it is immediately started.
It takes 1 second to run.
In this 1 second two more tests are compiled. First BarTest is prepended
to the list, then BazTest.
The order of test runs now is:

FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run.
While FooTest runs, more than two other tests are compiled.
The order of test runs is:

FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

This can be fixed either by appending new test modules to the end of the
list, or - and that's what this commit does - by using a `:queue`
instead.
Comment on lines 148 to 150
n = :queue.len(state.async_modules)
count = min(count, n)
{modules, async_modules} = :queue.split(count, state.async_modules)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Length is an expensive operation (linear), what we probably want to do is a "take_until" or "take_while". I have implemented this 10 times in GenStage projects, let me try to find one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find it, sorry. But the idea is to have a loop that pops from the queue, either until nothing is popped or we reach 0. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood 👍

@josevalim
Copy link
Member

@SteffenDE you may have uncovered a bug caused by order dependency between tests. :D We are forgetting to :code.purge/delete Sample somewhere. I can try debugging this later tomorrow, it is definitely not your fault. :)

@SteffenDE
Copy link
Contributor Author

@josevalim I changed all places where :code.delete is called to purge first. I guess sometimes :code.delete returned false and we didn't notice. Works locally now.

@SteffenDE
Copy link
Contributor Author

btw not related to this PR, but every time I run make test locally the following test times out:

  1) test blaming annotates undefined function error with module suggestions (ExceptionTest)
     test/elixir/exception_test.exs:589
     ** (ExUnit.TimeoutError) test timed out after 60000ms. You can change the timeout:

Seems to work in CI, so not sure what's wrong. Running macOS 14, OTP 26.

@josevalim
Copy link
Member

@SteffenDE most likely your OTP version. There is an OTP bug on 26.1 and 26.2, it is addressed on 27 and should be on 26.3 as well.

@josevalim
Copy link
Member

I fixed the other failure, it was caused by ordering as well. :)

@josevalim josevalim merged commit 844c4c3 into elixir-lang:main Jun 5, 2024
6 of 9 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants