Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry failed tests automatically #658

Open
harikamankala opened this issue Oct 28, 2024 · 7 comments
Open

Retry failed tests automatically #658

harikamankala opened this issue Oct 28, 2024 · 7 comments

Comments

@harikamankala
Copy link

🤔 What's the problem you're trying to solve?

It is always frustrating when the test fails with no reason and just works fine with a retry. It is common to fail one or other test when there are thousands of tests running in parallel. There can be multiple reasons for failure like network slowness, overloaded server.

✨ What's your proposed solution?

It would be really helpful if we have a feature to rerun the failed tests.
Rerun option is also provided by other tools for example

ctest -> https://cmake.org/cmake/help/v3.17/manual/ctest.1.html ( --repeat until-pass:n )
maven-surefire-plugin -> https://maven.apache.org/surefire/maven-surefire-plugin/examples/rerun-failing-tests.html

I have also seen in here cucumber-rs/cucumber#212

It would be really helpful, if you consider this. Thank you :)

⛏ Have you considered any alternatives or workarounds?

No response

📚 Any additional context?

No response

@nhv96
Copy link

nhv96 commented Oct 28, 2024

I want to share my experience about test failing sometimes without reasons (aka flaky tests), and how I and my team solved it.
When using godog to apply BDD test for our product, which is built on top of microservices architecture, we have thousand of features and even though we tried so hard to keep each of them separated with each other in term of context, still there are many reason that can lead to the test fail, most common are:

  • async read/write in DB layer: a step is supposed to insert data into a table, and another step is reading from it, but sometime the logic implementation of this feature is running concurrently, and the second step failed to read that data out, causing a fail.
  • race condition in DB layer: as you can tell, maybe there are some other features happened to read/write into the same data in DB, causing unexpected behavior.
  • async pubsub: a feature is supposed to test if service A can send a message to service B using Kafka, so first step is service A publish a message, second step is check for service B has received the message from Kafka. Sometime when the system running in high load, the message can arrive late in service B, causing the test failed.
    And there're more...

But as users using the library to test, we did not expect anything more from godog, and the problem here is clearly that we didn't design our system good enough, we didn't thoroughly think of the real scenario when implement the logic, our tests are not well prepared (should use hooks to setup DB, each feature or service should have it's own environment to run, etc...).
Thanks to flaky tests like this, we can know what problem is existing in our system, since it can occur in tests, it will definitely occur in production, and yup we did have some incidents and avoided some, thanks to godog.

What we did back then to avoid the flaky tests:

  • Review again our system design of the feature (not the .feature files), try to over look or understand it fully, if there are many edge cases that can happen or risky, we refactor it.
  • If that is something we can't do anything about, say the service A which is sending message to us (service B), is owned by a different team, but we still need to make sure the CI test pipeline to pass and not block everyone else, then we implement a retry mechanism in the logic of the test by ourself. Because we know that problem can happen, but maybe it's not worth spending much resources to fix the big system, in production environment we already have other means to prevent it, so that is an acceptable work around.

So IMO, godog is currently doing a very good job by telling us what feature is going wrong with its parallel execution. It is our responsibility when use it to test our code, that we must fully understand how the code is running and cannot expect some library to help us retry errors due to our bad implementation.

@nhv96
Copy link

nhv96 commented Oct 28, 2024

But another opinion is that, we also went through the time where the product features expanded so fast and the team didn't have enough time to look into each failed test to resolve them all, our CI failed from time to time and everyone went with "just rerun the test pipeline!!!" or flag it as @quarantined features so godog won't run it.

Retry failed tests seem to be a valid request as well, just for the sake of having an ability to mark something is flaky and not too error prone, and then the team can work on them when they have time 😄

@nhv96
Copy link

nhv96 commented Nov 15, 2024

hi @vearutop I'd like to work on this issue, do you have any concerns?

@vearutop
Copy link
Member

vearutop commented Nov 16, 2024

The topic of retries is a bit broad. There is an accepted proposal for Go in this area: golang/go#62244 (comment).

I'm not sure what would be the most useful implementation in our domain. I see a few ways with this.

  • Expect godog.ErrRetry from the step function (probably together with var int r = godog.Retries(ctx) to get current attempt in the step function). ErrRetry would stop further scenario execution and schedule it for a retry. This allows precise control from the step function, but probably is not convenient, especially if the step is defined in a 3rd-party lib. I think this idea is not so great.

  • Have an explicit statement in scenario, e.g.

When A must pass
And B must pass
And if further steps fail, retry up to 5 times
And C should pass
And D should pass
And E must pass

Here should pass indicates flaky steps, that could benefit from retries.

Such strategy could potentially be implemented with something like return godog.AllowRetries(ctx, 5) from a if further steps fail, retry up to X times step definition. This approach gives some level of precision (fail in E must pass would still trigger a retry). It allows to "quickly fix" flaky scenarios by injecting a domain-agnostic step at the beginning of flakyness. Having var int r = godog.Retries(ctx) would make sense here too, to allow simple implementation of exponential backoff in steps (e.g. increase timeouts on retry). This is my preferred implementation strategy for balance of simplicity and precise control.

  • Allow scenario retries based on tag, e.g. godog.Options{MaxRetries: 5, RetryTags:[]string{"flaky"}}.
@flaky
Scenario: passing mine field
 When I move ahead
 Then I stay alive

From test suite maintenance complexity this seems like the explicit statement above. But precision and control are less. Though, @flaky tag is easier to spot when skimming through the Feature.

  • Use global retry policy, e.g. godog.Options{MaxRetries: 5}. This is the easiest to enable, but least precise too. If the scenario fails for non-flaky reasons, it will have to run MaxRetries times anyway. Also this policy may discourage fixes of actual flaky tests, since they will go unnoticed.

These strategies do not conflict with each other and could potentially even co-exist, but probably it would be best to have just "one good way" of tackling the issue.

@nhv96 what do you think should be the implementation strategy (and please share if you have other ideas for it)?

@luke-hill
Copy link
Contributor

So the accepted retry flag is that you pass a variable into the CLI arg (i.e. 2), and then it will retry "each" test case up to that many times. The result of a test case is the least worst result (Noting once you have a pass then you are fine).

Ruby has a separate unique binding that has a max cap on retries (But this shouldn't be seen as the standard bearer and has been rejected by other flavours).

TL;DR - retry is a common flag used in most flavours of cucumber and is part of the compatibility-kit

@nhv96
Copy link

nhv96 commented Nov 18, 2024

@vearutop Thank you for pointing me to the Go discussion, it is very insightful.

My initial approach was to add a global flag in godog.Options, as you mentioned. But your concern that doing so might lead to the laziness of the developers and they won't actually care about the test results, is also a valid point.

The idea with explicit statements in .feature file is easy to read and understand, but my concern is will it actually add values to the specification of the feature? Other people from different teams might raise the question if these steps are necessary to add to the file.

This leaves the other 2 options: using godog.ErrRetry and @flaky tags seem to be more beneficial for the framework as well as users. Both of these approaches have the common of giving the users more control of the code being tested, and they can instruct godog what to do.

If I can only chose one, I'd prefer the using of @flaky tags, it doesn't introduce extra steps in the feature but also give enough visibility to understand what tests should be focus on.

Also, I believe this approach return godog.AllowRetries(ctx, 5) is viable without having to add statements into the .feature file right? As long as the logic can run and retry and then pass as expected, we can still give the users this solution. 🤔

@vearutop
Copy link
Member

Makes sense to me, please proceed with this if you have capacity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants