Revert "remove incorrect end-to-end implementation" #77

russellb · 2024-06-26T02:25:07Z

This reverts commit ebc5e31 from #76.

Here is my comment on #76 that better explains how this workflow fits
in to the bigger picture:

Testing the user end-to-end instructlab workflow is the point of
this workflow. It will give you an early indication if your changes to
the library break the CLI. Sometimes that might be on purpose, but in
case it's not, it will let you know before you publish a release.

This is not intended to replace other functional testing aimed at the
library more directly, but the full end-to-end workflow is important.

I'm OK if you want to disable it for the moment, but it needs to go
back on at some point. A better way to do that is to just edit the
workflow to turn it off instead of removing it from the repo.

Issue #63

Signed-off-by: Russell Bryant [email protected]

russellb · 2024-06-26T02:25:47Z

You don't have to merge this right away if it was causing a short-term problem, but I'm staging it here for your convenience.

An alternative: I can adjust this so it doesn't run automatically but instead runs on-demand when someone wants to run it.

.github/workflows/spellcheck.yml

.github/workflows/e2e.yml

RobotSail

Some thoughts:

We want the end-to-end tests to be structured in a way where we have a clean file consuming the latest version of our training library. The tests should live in this repo so that they can be adjusted, and the proper advisories can be made WRT updating the upstream CLI repo.
We are not planning on using the spell-checker going forward.

n1hility · 2024-06-26T16:13:32Z

So while we might have chosen to break things into separate repos, I don't think we agreed to run them as completely independent projects with their own processes and governance. We still need to share and build off of each other IMO.

RobotSail · 2024-06-26T17:30:35Z

@n1hility I don't disagree, and would like to introduce e2e tests that don't break consuming libraries. The issue with this implementation is that these tests live beyond our repo, and therefore are very fragile. We would like to keep everything within the same repo. Happy to implement this myself after the 15th.

russellb · 2024-06-26T17:31:41Z

Relevant note: this test script has not been updated to use the new training code. It was changed to use legacy mode when new training was merged into the CLI. At the moment, the test coverage would be ensuring that everything can still be installed together, including whatever changes are present in a training PR.

Once this issue is resolved, it would also ensure that ilab model train works, at least in some basic configuration.

instructlab/instructlab#1470

n1hility · 2024-06-26T19:42:15Z

@n1hility I don't disagree, and would like to introduce e2e tests that don't break consuming libraries. The issue with this implementation is that these tests live beyond our repo, and therefore are very fragile. We would like to keep everything within the same repo. Happy to implement this myself after the 15th.

Can you elaborate how it's fragile? This is a pretty common practice for e2e testing the combination of a library into a primary consumer, you run a smoke test of the consumer in your project so that you catch compatibility problems.

RobotSail · 2024-06-27T00:35:25Z

Can you elaborate how it's fragile?

Having end-to-end tests in a separate repo means that if we make a breaking change, we would need to fix the tests in the upstream to get them working on our end. It makes it so we are coupled to the upstream repo, and if these tests are to break for any reason, this would impact our PR status downstream.

On the other hand, defining an end-to-end test suite within this repo allows us to ensure that it's reflective of how we intend this library to be used. This also makes it much more straightforward to develop the tests as we build out the library.

In either case we'd be able to detect breakages, but one allows us much more freedom with how we go about testing the library.

nathan-weinberg · 2024-06-27T03:57:39Z

I'll just add to the debate of the value of the E2E that we've added it to the Eval repo and it has already helped identify the need for a dependency bump that @danmcp implemented: instructlab/eval#14 (comment)

n1hility · 2024-06-27T07:37:21Z

Can you elaborate how it's fragile?

Having end-to-end tests in a separate repo means that if we make a breaking change, we would need to fix the tests in the upstream to get them working on our end. It makes it so we are coupled to the upstream repo, and if these tests are to break for any reason, this would impact our PR status downstream.

On the other hand, defining an end-to-end test suite within this repo allows us to ensure that it's reflective of how we intend this library to be used. This also makes it much more straightforward to develop the tests as we build out the library.

In either case we'd be able to detect breakages, but one allows us much more freedom with how we go about testing the library.

Thanks for sharing your thinking. What you describe is really mock integration testing than a form e2e testing. Both have their uses and are complementary, but mock testing is trading real world accuracy for speed, since you are only testing what you anticipate the interaction to be vs what it actually is. This makes a lot of sense when e2e is too slow for PR testing, since better to run it nightly and unit + mock tests in PR. However in this case, it's a fast test suite (only 30 mins mostly from build), and the alternative we have right now is no testing. Granted this is only testing the legacy training path, so its value is more limited than what it could be. However it will catch API breakage and dependency issues and some interaction.

While a planned API change will cause these to fail, it's pretty easy to suspend them, or alternatively sustain compatibility until you update the dependent ilab project.

RobotSail · 2024-06-27T13:15:34Z

A few things:

30 mins is way too slow for an end-to-end test. At the pace we push PRs, these will be reset often and will likely go to waste.
The current e2e test which was added still failed to catch a bug as it showed everything passed successfully.
Presently, we enforce our library's interface via pydantic, so everything in terms of options will either default or throw an error. Beyond that, once run_training is invoked everything that it runs would be that which lives in this repo. Meaning that both end-to-end tests would be checking the same functionality.

cdoern · 2024-06-27T14:58:36Z

30 mins is way too slow for an end-to-end test. At the pace we push PRs, these will be reset often and will likely go to waste.

This is a pretty standard (if not fast) time for an e2e, I have seen ones in OCP that take hours.

Sorry to jump into the convo super late here, but in all honesty I think what this entire conversations boils down to is:

speed of development vs quality of code

We have been struggling to find a balance here. So this is what I would suggest. I think we should add back the e2e, but with the full understanding that it might break sometimes with immediate or pretty quick fixes. As long as we can use the test to pinpoint which changes introduced an issue, that is better than no e2e AND it is better than stalling necessary development.

We need to find a middle ground here, adding the test but understanding the circumstances seems pretty measured to me.

I understand both perspectives here and I am open to any feedback on this stance!

russellb · 2024-06-27T15:55:15Z

Updated the PR to drop re-adding spellcheck. It was included in the commit I was reverting, but is not in scope for what I really care about here.

Review is out-of-date

RobotSail · 2024-06-27T18:33:39Z

So after a thorough discussion among the training team, our decision is as follows:

It's OK if these tests live in the instructlab repo
The end-to-end tests must use our training library in the case of training
Optimizations should be made so that the end-to-end test takes the minimum amount of time needed to train. This shouldn't be a 30 minute process if at all possible.

We feel that this is only fair considering these tests live beyond our repository.

This reverts commit ebc5e31 from instructlab#76. Here is my comment on instructlab#76 that better explains how this workflow fits in to the bigger picture: > Testing the user end-to-end instructlab workflow is the point of > this workflow. It will give you an early indication if your changes to > the library break the CLI. Sometimes that might be on purpose, but in > case it's not, it will let you know before you publish a release. > > This is not intended to replace other functional testing aimed at the > library more directly, but the full end-to-end workflow is important. > > I'm OK if you want to disable it for the moment, but it needs to go > back on at some point. A better way to do that is to just edit the > workflow to turn it off instead of removing it from the repo. The previous commit also removed spellcheck (though it wasn't mentioned). I have left it out in this PR. Issue instructlab#63 Signed-off-by: Russell Bryant <[email protected]>

RobotSail · 2024-07-01T20:21:22Z

After further discussion with related parties, we came to the conclusion that these e2e tests will be going in.

RobotSail · 2024-07-01T20:22:28Z

This PR can be merged after instructlab/instructlab#1494

booxter · 2024-07-26T22:54:48Z

@RobotSail can this merge now?

RobotSail · 2024-07-29T14:22:40Z

@russellb Can could you please update this test so that it's running with the full training flag, e.g. -fT?

https://github.com/instructlab/instructlab/blob/323e503002ca314c7b44ec46fd5b00318a960cee/scripts/basic-workflow-tests.sh#L480

RobotSail · 2024-07-29T14:23:28Z

.github/workflows/e2e.yml

+      - name: Run e2e test
+        run: |
+          . venv/bin/activate
+          ./instructlab/scripts/basic-workflow-tests.sh -cm


Please run this with -fT for -f fulltrain and -T training library.

russellb · 2024-08-09T20:21:41Z

This has gone stale at this point.

I would suggest considering an EC2-based workflow instead of this that ran on a github runner. That might as well be a new PR.

RobotSail reviewed Jun 26, 2024

View reviewed changes

.github/workflows/spellcheck.yml Outdated Show resolved Hide resolved

RobotSail reviewed Jun 26, 2024

View reviewed changes

.github/workflows/e2e.yml Show resolved Hide resolved

RobotSail requested changes Jun 26, 2024

View reviewed changes

nathan-weinberg previously approved these changes Jun 27, 2024

View reviewed changes

cdoern approved these changes Jun 27, 2024

View reviewed changes

russellb force-pushed the revert-revert-revert-revert-e2e branch from f9bfcee to 5f4d371 Compare June 27, 2024 15:53

russellb force-pushed the revert-revert-revert-revert-e2e branch from 5f4d371 to eb71ef7 Compare June 28, 2024 00:15

RobotSail reviewed Jul 29, 2024

View reviewed changes

russellb closed this Aug 9, 2024

RobotSail mentioned this pull request Aug 19, 2024

feat: add e2e test for instructlab CI #174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "remove incorrect end-to-end implementation" #77

Revert "remove incorrect end-to-end implementation" #77

russellb commented Jun 26, 2024

russellb commented Jun 26, 2024 •

edited

Loading

RobotSail left a comment

n1hility commented Jun 26, 2024

RobotSail commented Jun 26, 2024

russellb commented Jun 26, 2024

n1hility commented Jun 26, 2024

RobotSail commented Jun 27, 2024

nathan-weinberg commented Jun 27, 2024

n1hility commented Jun 27, 2024

RobotSail commented Jun 27, 2024 •

edited

Loading

cdoern commented Jun 27, 2024

russellb commented Jun 27, 2024

RobotSail commented Jun 27, 2024

RobotSail commented Jul 1, 2024

RobotSail commented Jul 1, 2024

booxter commented Jul 26, 2024

RobotSail commented Jul 29, 2024

RobotSail Jul 29, 2024

russellb commented Aug 9, 2024

Revert "remove incorrect end-to-end implementation" #77

Revert "remove incorrect end-to-end implementation" #77

Conversation

russellb commented Jun 26, 2024

russellb commented Jun 26, 2024 • edited Loading

RobotSail left a comment

Choose a reason for hiding this comment

n1hility commented Jun 26, 2024

RobotSail commented Jun 26, 2024

russellb commented Jun 26, 2024

n1hility commented Jun 26, 2024

RobotSail commented Jun 27, 2024

nathan-weinberg commented Jun 27, 2024

n1hility commented Jun 27, 2024

RobotSail commented Jun 27, 2024 • edited Loading

cdoern commented Jun 27, 2024

russellb commented Jun 27, 2024

RobotSail commented Jun 27, 2024

RobotSail commented Jul 1, 2024

RobotSail commented Jul 1, 2024

booxter commented Jul 26, 2024

RobotSail commented Jul 29, 2024

RobotSail Jul 29, 2024

Choose a reason for hiding this comment

russellb commented Aug 9, 2024

russellb commented Jun 26, 2024 •

edited

Loading

RobotSail commented Jun 27, 2024 •

edited

Loading