use a `:queue` to store modules in ExUnit.Server #13636

SteffenDE · 2024-06-04T15:43:04Z

In order to have more deterministic test runs when using --max-cases 1 and --max-requires 1 (#13635) (see also #13589), we need to run tests in compilation order (FIFO).

In the past, ExUnit.Server appended new tests to the front of a list, which would result in the most recently added test to be run first.

Let's quickly demonstrate the problem this causes for deterministic runs with a simple example:

Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as async tests execute in parallel with compilation of other test files, we could have the following scenario:

FooTest is compiled and because it's async it is immediately started. It takes 1 second to run.
In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest.
The order of test runs now is:

FooTest, BazTest, then whatever is last compiled while BazTest runs, ...

Now another run, FooTest takes 5 seconds to run.
While FooTest runs, more than two other tests are compiled. The order of test runs is:

FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest

This can be fixed either by appending new test modules to the end of the list, or - and that's what this commit does - by using a :queue instead.

In order to have more deterministic test runs when using `--max-cases 1` and `--max-requires 1` (elixir-lang#13635) (see also elixir-lang#13589), we need to run tests in compilation order (FIFO). In the past, ExUnit.Server appended new tests to the front of a list, which would result in the most recently added test to be run first. Let's quickly demonstrate the problem this causes for deterministic runs with a simple example: Imagine a test (let's call if FooTest) that takes a non-deterministic amount of time to run. For now let's assume that it sometimes takes 1 second and sometimes up to 5. And as async tests execute in parallel with compilation of other test files, we could have the following scenario: FooTest is compiled and because it's async it is immediately started. It takes 1 second to run. In this 1 second two more tests are compiled. First BarTest is prepended to the list, then BazTest. The order of test runs now is: FooTest, BazTest, then whatever is last compiled while BazTest runs, ... Now another run, FooTest takes 5 seconds to run. While FooTest runs, more than two other tests are compiled. The order of test runs is: FooTest, LastCompiledTest, SecondLastCompiledTest, ..., BazTest, BarTest This can be fixed either by appending new test modules to the end of the list, or - and that's what this commit does - by using a `:queue` instead.

josevalim · 2024-06-04T16:39:22Z

lib/ex_unit/lib/ex_unit/server.ex

+        n = :queue.len(state.async_modules)
+        count = min(count, n)
+        {modules, async_modules} = :queue.split(count, state.async_modules)


Length is an expensive operation (linear), what we probably want to do is a "take_until" or "take_while". I have implemented this 10 times in GenStage projects, let me try to find one.

I couldn't find it, sorry. But the idea is to have a loop that pops from the queue, either until nothing is popped or we reach 0. :)

Understood 👍

josevalim · 2024-06-04T21:46:59Z

@SteffenDE you may have uncovered a bug caused by order dependency between tests. :D We are forgetting to :code.purge/delete Sample somewhere. I can try debugging this later tomorrow, it is definitely not your fault. :)

SteffenDE · 2024-06-05T08:31:38Z

@josevalim I changed all places where :code.delete is called to purge first. I guess sometimes :code.delete returned false and we didn't notice. Works locally now.

SteffenDE · 2024-06-05T08:33:59Z

btw not related to this PR, but every time I run make test locally the following test times out:

  1) test blaming annotates undefined function error with module suggestions (ExceptionTest)
     test/elixir/exception_test.exs:589
     ** (ExUnit.TimeoutError) test timed out after 60000ms. You can change the timeout:

Seems to work in CI, so not sure what's wrong. Running macOS 14, OTP 26.

josevalim · 2024-06-05T08:50:19Z

@SteffenDE most likely your OTP version. There is an OTP bug on 26.1 and 26.2, it is addressed on 27 and should be on 26.3 as well.

josevalim · 2024-06-05T09:20:49Z

I fixed the other failure, it was caused by ordering as well. :)

josevalim · 2024-06-05T09:21:10Z

💚 💙 💜 💛 ❤️

josevalim reviewed Jun 4, 2024

View reviewed changes

recursive :queue.out instead of :queue.len

46cf62e

always purge old modules before deleting

19c5afc

josevalim merged commit 844c4c3 into elixir-lang:main Jun 5, 2024
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use a `:queue` to store modules in ExUnit.Server #13636

use a `:queue` to store modules in ExUnit.Server #13636

SteffenDE commented Jun 4, 2024

josevalim Jun 4, 2024

josevalim Jun 4, 2024

SteffenDE Jun 4, 2024

josevalim commented Jun 4, 2024

SteffenDE commented Jun 5, 2024

SteffenDE commented Jun 5, 2024

josevalim commented Jun 5, 2024

josevalim commented Jun 5, 2024

josevalim commented Jun 5, 2024

use a :queue to store modules in ExUnit.Server #13636

use a :queue to store modules in ExUnit.Server #13636

Conversation

SteffenDE commented Jun 4, 2024

josevalim Jun 4, 2024

Choose a reason for hiding this comment

josevalim Jun 4, 2024

Choose a reason for hiding this comment

SteffenDE Jun 4, 2024

Choose a reason for hiding this comment

josevalim commented Jun 4, 2024

SteffenDE commented Jun 5, 2024

SteffenDE commented Jun 5, 2024

josevalim commented Jun 5, 2024

josevalim commented Jun 5, 2024

josevalim commented Jun 5, 2024

use a `:queue` to store modules in ExUnit.Server #13636

use a `:queue` to store modules in ExUnit.Server #13636