Add Evaluations area with Weave as a supported eval flow #917

johndmulhausen · 2024-11-07T21:59:08Z

In several areas where the W&B ecosystem is described, the content is reflecting the old days when we were a one-product company and didn't have Weave to offer as part of the evaluation story. This PR is integrating Weave into the ecosystem more and highlighting our Evaluation options (Tables, Weave) with a new high-level section.

Resolves DOCS-1024

cloudflare-workers-and-pages · 2024-11-07T21:59:30Z

Deploying docodile with Cloudflare Pages

Latest commit:	`3e3f4da`
Status:	✅ Deploy successful!
Preview URL:	https://5853e7dc.docodile.pages.dev
Branch Preview URL:	https://weave-evaluations.docodile.pages.dev

View logs

docs/guides/intro.md

docs/guides/evaluations/evaluate-models-weave.md

github-actions · 2024-11-14T19:49:41Z

docs/guides/evaluations/evaluate-models-weave.md

+
+First, create a W&B account at https://wandb.ai and copy your API key from https://wandb.ai/authorize.
+
+Then, you can follow along in the below Colab notebook that demonstrates Weave evaluating an LLM (in this case, OpenAI, for which you will also need [an API key](https://platform.openai.com/docs/quickstart/step-2-setup-your-api-key)).


⚠️ [vale] _{reported by reviewdog 🐶}
[Google.Will] Avoid using 'will'.

github-actions · 2024-11-19T07:06:38Z

docs/guides/evaluations/evaluate-models-weave.md

+
+## Use Weave to evaluate models in production
+
+This [tutorial on how to build an evaluation pipeline with Weave](https://weave-docs.wandb.ai/tutorial-eval/) demonstrates how multiple versions of an application that uses a model is evolving using the `weave.Evaluation` function, which assess a Model's performance on a set of examples using a list of specified scoring functions or `weave.scorer.Scorer` classes, producing dashboards with advanced breakdowns of the model's performance.


⚠️ [vale] _{reported by reviewdog 🐶}
[Google.WordList] Use 'app' instead of 'application'.

github-actions · 2024-11-19T07:06:54Z

Summary

Status	Count
🔍 Total	34
✅ Successful	16
⏳ Timeouts	0
🔀 Redirected	0
👻 Excluded	17
❓ Unknown	0
🚫 Errors	1

Errors per input

Errors in docs/guides/evaluations/evaluate-models-weave.md

[403] https://platform.openai.com/docs/quickstart/step-2-setup-your-api-key | Failed: Network error: Forbidden
Full Github Actions output

Add Evaluations area with Weave

1561468

johndmulhausen requested a review from a team as a code owner November 7, 2024 21:59

johndmulhausen added the WIP label Nov 7, 2024

Merge branch 'main' into weave-evaluations

70fc691

github-actions bot reviewed Nov 7, 2024

View reviewed changes

docs/guides/intro.md Show resolved Hide resolved

ngrayluna and others added 2 commits November 8, 2024 10:08

Merge branch 'main' into weave-evaluations

2cf43a5

Evaluate Models with Weave page

8d0844e

github-actions bot reviewed Nov 14, 2024

View reviewed changes

Vale feedback

ed80bbc

github-actions bot reviewed Nov 14, 2024

View reviewed changes

Rewrites

66996ef

github-actions bot reviewed Nov 19, 2024

View reviewed changes

Merge branch 'main' into weave-evaluations

3e3f4da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Evaluations area with Weave as a supported eval flow #917

Add Evaluations area with Weave as a supported eval flow #917

johndmulhausen commented Nov 7, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Nov 7, 2024 •

edited

Loading

github-actions bot Nov 14, 2024

github-actions bot Nov 19, 2024

github-actions bot commented Nov 19, 2024


		First, create a W&B account at https://wandb.ai and copy your API key from https://wandb.ai/authorize.

		Then, you can follow along in the below Colab notebook that demonstrates Weave evaluating an LLM (in this case, OpenAI, for which you will also need [an API key](https://platform.openai.com/docs/quickstart/step-2-setup-your-api-key)).


		## Use Weave to evaluate models in production

		This [tutorial on how to build an evaluation pipeline with Weave](https://weave-docs.wandb.ai/tutorial-eval/) demonstrates how multiple versions of an application that uses a model is evolving using the `weave.Evaluation` function, which assess a Model's performance on a set of examples using a list of specified scoring functions or `weave.scorer.Scorer` classes, producing dashboards with advanced breakdowns of the model's performance.

Add Evaluations area with Weave as a supported eval flow #917

Are you sure you want to change the base?

Add Evaluations area with Weave as a supported eval flow #917

Conversation

johndmulhausen commented Nov 7, 2024 • edited Loading

cloudflare-workers-and-pages bot commented Nov 7, 2024 • edited Loading

Deploying docodile with Cloudflare Pages

github-actions bot Nov 14, 2024

Choose a reason for hiding this comment

github-actions bot Nov 19, 2024

Choose a reason for hiding this comment

github-actions bot commented Nov 19, 2024

Summary

Errors per input

Errors in docs/guides/evaluations/evaluate-models-weave.md

johndmulhausen commented Nov 7, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Nov 7, 2024 •

edited

Loading