Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "remove incorrect end-to-end implementation" #77

Closed

Conversation

russellb
Copy link
Member

This reverts commit ebc5e31 from #76.

Here is my comment on #76 that better explains how this workflow fits
in to the bigger picture:

Testing the user end-to-end instructlab workflow is the point of
this workflow. It will give you an early indication if your changes to
the library break the CLI. Sometimes that might be on purpose, but in
case it's not, it will let you know before you publish a release.

This is not intended to replace other functional testing aimed at the
library more directly, but the full end-to-end workflow is important.

I'm OK if you want to disable it for the moment, but it needs to go
back on at some point. A better way to do that is to just edit the
workflow to turn it off instead of removing it from the repo.

Issue #63

Signed-off-by: Russell Bryant [email protected]

@russellb
Copy link
Member Author

russellb commented Jun 26, 2024

You don't have to merge this right away if it was causing a short-term problem, but I'm staging it here for your convenience.

An alternative: I can adjust this so it doesn't run automatically but instead runs on-demand when someone wants to run it.

Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts:

  1. We want the end-to-end tests to be structured in a way where we have a clean file consuming the latest version of our training library. The tests should live in this repo so that they can be adjusted, and the proper advisories can be made WRT updating the upstream CLI repo.
  2. We are not planning on using the spell-checker going forward.

@n1hility
Copy link
Member

So while we might have chosen to break things into separate repos, I don't think we agreed to run them as completely independent projects with their own processes and governance. We still need to share and build off of each other IMO.

@RobotSail
Copy link
Member

@n1hility I don't disagree, and would like to introduce e2e tests that don't break consuming libraries. The issue with this implementation is that these tests live beyond our repo, and therefore are very fragile. We would like to keep everything within the same repo. Happy to implement this myself after the 15th.

@russellb
Copy link
Member Author

Relevant note: this test script has not been updated to use the new training code. It was changed to use legacy mode when new training was merged into the CLI. At the moment, the test coverage would be ensuring that everything can still be installed together, including whatever changes are present in a training PR.

Once this issue is resolved, it would also ensure that ilab model train works, at least in some basic configuration.

instructlab/instructlab#1470

@n1hility
Copy link
Member

@n1hility I don't disagree, and would like to introduce e2e tests that don't break consuming libraries. The issue with this implementation is that these tests live beyond our repo, and therefore are very fragile. We would like to keep everything within the same repo. Happy to implement this myself after the 15th.

Can you elaborate how it's fragile? This is a pretty common practice for e2e testing the combination of a library into a primary consumer, you run a smoke test of the consumer in your project so that you catch compatibility problems.

@RobotSail
Copy link
Member

Can you elaborate how it's fragile?

Having end-to-end tests in a separate repo means that if we make a breaking change, we would need to fix the tests in the upstream to get them working on our end. It makes it so we are coupled to the upstream repo, and if these tests are to break for any reason, this would impact our PR status downstream.

On the other hand, defining an end-to-end test suite within this repo allows us to ensure that it's reflective of how we intend this library to be used. This also makes it much more straightforward to develop the tests as we build out the library.

In either case we'd be able to detect breakages, but one allows us much more freedom with how we go about testing the library.

@nathan-weinberg
Copy link
Member

I'll just add to the debate of the value of the E2E that we've added it to the Eval repo and it has already helped identify the need for a dependency bump that @danmcp implemented: instructlab/eval#14 (comment)

@n1hility
Copy link
Member

Can you elaborate how it's fragile?

Having end-to-end tests in a separate repo means that if we make a breaking change, we would need to fix the tests in the upstream to get them working on our end. It makes it so we are coupled to the upstream repo, and if these tests are to break for any reason, this would impact our PR status downstream.

On the other hand, defining an end-to-end test suite within this repo allows us to ensure that it's reflective of how we intend this library to be used. This also makes it much more straightforward to develop the tests as we build out the library.

In either case we'd be able to detect breakages, but one allows us much more freedom with how we go about testing the library.

Thanks for sharing your thinking. What you describe is really mock integration testing than a form e2e testing. Both have their uses and are complementary, but mock testing is trading real world accuracy for speed, since you are only testing what you anticipate the interaction to be vs what it actually is. This makes a lot of sense when e2e is too slow for PR testing, since better to run it nightly and unit + mock tests in PR. However in this case, it's a fast test suite (only 30 mins mostly from build), and the alternative we have right now is no testing. Granted this is only testing the legacy training path, so its value is more limited than what it could be. However it will catch API breakage and dependency issues and some interaction.

While a planned API change will cause these to fail, it's pretty easy to suspend them, or alternatively sustain compatibility until you update the dependent ilab project.

@RobotSail
Copy link
Member

RobotSail commented Jun 27, 2024

A few things:

  • 30 mins is way too slow for an end-to-end test. At the pace we push PRs, these will be reset often and will likely go to waste.
  • The current e2e test which was added still failed to catch a bug as it showed everything passed successfully.
  • Presently, we enforce our library's interface via pydantic, so everything in terms of options will either default or throw an error. Beyond that, once run_training is invoked everything that it runs would be that which lives in this repo. Meaning that both end-to-end tests would be checking the same functionality.

@cdoern
Copy link
Contributor

cdoern commented Jun 27, 2024

30 mins is way too slow for an end-to-end test. At the pace we push PRs, these will be reset often and will likely go to waste.

This is a pretty standard (if not fast) time for an e2e, I have seen ones in OCP that take hours.

Sorry to jump into the convo super late here, but in all honesty I think what this entire conversations boils down to is:

speed of development vs quality of code

We have been struggling to find a balance here. So this is what I would suggest. I think we should add back the e2e, but with the full understanding that it might break sometimes with immediate or pretty quick fixes. As long as we can use the test to pinpoint which changes introduced an issue, that is better than no e2e AND it is better than stalling necessary development.

We need to find a middle ground here, adding the test but understanding the circumstances seems pretty measured to me.

I understand both perspectives here and I am open to any feedback on this stance!

@russellb russellb force-pushed the revert-revert-revert-revert-e2e branch from f9bfcee to 5f4d371 Compare June 27, 2024 15:53
@russellb
Copy link
Member Author

Updated the PR to drop re-adding spellcheck. It was included in the commit I was reverting, but is not in scope for what I really care about here.

@nathan-weinberg nathan-weinberg dismissed their stale review June 27, 2024 17:32

Review is out-of-date

@RobotSail
Copy link
Member

So after a thorough discussion among the training team, our decision is as follows:

  • It's OK if these tests live in the instructlab repo
  • The end-to-end tests must use our training library in the case of training
  • Optimizations should be made so that the end-to-end test takes the minimum amount of time needed to train. This shouldn't be a 30 minute process if at all possible.

We feel that this is only fair considering these tests live beyond our repository.

This reverts commit ebc5e31 from instructlab#76.

Here is my comment on instructlab#76 that better explains how this workflow fits
in to the bigger picture:

> Testing the user end-to-end instructlab workflow is the point of
> this workflow. It will give you an early indication if your changes to
> the library break the CLI. Sometimes that might be on purpose, but in
> case it's not, it will let you know before you publish a release.
>
> This is not intended to replace other functional testing aimed at the
> library more directly, but the full end-to-end workflow is important.
>
> I'm OK if you want to disable it for the moment, but it needs to go
> back on at some point. A better way to do that is to just edit the
> workflow to turn it off instead of removing it from the repo.

The previous commit also removed spellcheck (though it wasn't
mentioned). I have left it out in this PR.

Issue instructlab#63

Signed-off-by: Russell Bryant <[email protected]>
@russellb russellb force-pushed the revert-revert-revert-revert-e2e branch from 5f4d371 to eb71ef7 Compare June 28, 2024 00:15
@RobotSail
Copy link
Member

After further discussion with related parties, we came to the conclusion that these e2e tests will be going in.

@RobotSail
Copy link
Member

This PR can be merged after instructlab/instructlab#1494

@booxter
Copy link
Contributor

booxter commented Jul 26, 2024

@RobotSail can this merge now?

@RobotSail
Copy link
Member

@russellb Can could you please update this test so that it's running with the full training flag, e.g. -fT?

https://github.com/instructlab/instructlab/blob/323e503002ca314c7b44ec46fd5b00318a960cee/scripts/basic-workflow-tests.sh#L480

- name: Run e2e test
run: |
. venv/bin/activate
./instructlab/scripts/basic-workflow-tests.sh -cm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run this with -fT for -f fulltrain and -T training library.

@russellb
Copy link
Member Author

russellb commented Aug 9, 2024

This has gone stale at this point.

I would suggest considering an EC2-based workflow instead of this that ran on a github runner. That might as well be a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants