Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Benchmark MLOps.NET #320

Closed
wants to merge 6 commits into from
Closed

[WIP] Benchmark MLOps.NET #320

wants to merge 6 commits into from

Conversation

Brett-Parker
Copy link
Collaborator

Resolves

Fixes #233

@Brett-Parker
Copy link
Collaborator Author

@aslotte I have pushed this as a baseline. I have been investigating this one for a few hours and there doesn't appear to be a great way to do this well.

The best I have come up with is;

Created a console app

This app can be called via CL dotnet MLOps.NET.Benchmarks.dll

Results
image

Comparisons

The only thing I can find to baseline and compare is something like this;
https://github.com/dotnet/performance/tree/master/src/tools/ResultsComparer#sample-results

Create a baseline csv report. Then every time we run the benchmark it compares against the baseline.

I have used Moq here to put something in the file so we can see roughly what structure I was thinking.

Before I continue I'd like some thoughts/feedback if possible.

@Brett-Parker Brett-Parker self-assigned this Aug 29, 2020
@Brett-Parker Brett-Parker changed the title Benchmark MLOps.NET [WIP]Benchmark MLOps.NET Aug 29, 2020
@aslotte
Copy link
Owner

aslotte commented Aug 29, 2020

Awesome @Brett-Parker, I was actually just thinking of Benchmark.NET.
I think you're on the right track, and we can do this in phases, to start by having benchmarks and then see how we can implement a comparison and a baseline, and even how to integrate some in our CI pipelines (can be step two).

A couple of thoughts:

  1. Benchmarks should run against the real implementation, thus we should create an MLOpsContext with access to a real database and a real model repository as it will give us a good understanding on any actual bottle necks.
  2. Given that, we probably want benchmarks for various storage providers, CosmosDB, SQLServer and SQLite. I don't think we need to implement all of these at once though, but we probably want some structure to separate them and various combinations. I think we can keep one MLOps.NET.Benchmark project and have different folders e.g.

SQLServer
|
|_____ LifeCycleCatalogBenchmark
|DeploymentCatalogBencmark
|
......
SQLite
|
|
____ LifeCycleCatalogBenchmark
|DeploymentCatalogBencmark
|
......
CosmosDb
|
|
____ LifeCycleCatalogBenchmark
|_DeploymentCatalogBencmark
|
......

Let me know if you can think of any other structure though, happy to bounce some ideas :)

@Brett-Parker
Copy link
Collaborator Author

@aslotte thanks for your quick reply. I agree with everything. I'll take a look in the morning and implement something basic as a baseline.

@Brett-Parker
Copy link
Collaborator Author

@aslotte Thoughts on this project structure?

image

Each DB will have a setup and cleanup.

If you agree with this approach I will start populating all with at least 1 benchmark in for all catalogs. Then this PR can be closed and I will open a new issue for comparison and CI.

@aslotte
Copy link
Owner

aslotte commented Aug 30, 2020

@Brett-Parker looks great, and it allows us to modify the structure as needed going forward, I like it. Just a heads up, there's one too many Ls in SQLite :)

Added LifeCycleCatalog Benchmark only
@Brett-Parker
Copy link
Collaborator Author

@aslotte I have now added LifeCycleCatalog and completed the structure for this. I think this is now a good place to review this PR. I will create a new issue for expanding this to other catalogs.

image

Things still to do

  • Add other catalog benchmarks
  • Investigate the -filter for benchmarkdotnet to allow benchmarking on specific integrations
  • Benchmark cleanup
  • Add comparison to benchmarks #321 - Benchmark comparison
  • Github action to flag comparison as acceptable or not.

@Brett-Parker Brett-Parker requested a review from aslotte August 30, 2020 15:46
@Brett-Parker Brett-Parker changed the title [WIP]Benchmark MLOps.NET Benchmark MLOps.NET Aug 30, 2020
Removed unneccessary usings.
@Brett-Parker
Copy link
Collaborator Author

@aslotte sorry, found mistakes. Rectified them now.

@aslotte
Copy link
Owner

aslotte commented Aug 30, 2020

Great @Brett-Parker!
A couple of thoughts:

  • I think we can keep all config values in one appsettings.json
  • We can add a reference to MLOps.NET.Tests.Common to use the configuration builder there
  • I think that just like for integration tests, we'll have the exact same benchmarks for different storage providers, the only thing that changes is the set up. One way we can make it faster to write new benchmarks across all storage providers would be to create a base class LifeCycleCatalogBenchmarks.cs where we store all the benchmarks, e.g. the on you just created for creating a run and then we keep the GlobalSetup in the storage specific benchmarks and inherit from LifeCycleCatalogBenchmark.

Like that we only need to write the benchmark one time, and it will run for all storage providers.

@aslotte
Copy link
Owner

aslotte commented Aug 30, 2020

Let me know if that makes sense and if I was able to explain it properly. I think it should work but happy to bounce some ideas

@Brett-Parker Brett-Parker changed the title Benchmark MLOps.NET [WIP] Benchmark MLOps.NET Sep 1, 2020
@aslotte
Copy link
Owner

aslotte commented Sep 15, 2020

Did you intend to close this one @Brett-Parker?

@Brett-Parker Brett-Parker deleted the issue_233 branch October 3, 2020 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark MLOps.NET
2 participants