Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance and Accuracy Benchmarks #53

Open
Tracked by #124
jamescho72 opened this issue Sep 29, 2024 · 1 comment
Open
Tracked by #124

Performance and Accuracy Benchmarks #53

jamescho72 opened this issue Sep 29, 2024 · 1 comment
Assignees
Labels
📐 benchmark Benchmarking granite

Comments

@jamescho72
Copy link

Setup and run benchmarks against our continue.dev/ollama/granite environment.
Run baselines against our competitors Deepseek2.5 2.4B active and 21B active, codestral-mamba 7B, llama3-8B-instruct, and granite 8B instruct 128k context length.

Find 100 line code example
Ask chat to document
Measure latency (how long to complete)
Measure accuracy (How many lines of documentation was generated, How accurate/correct was the documentation IE 9/10 lines correctly)
Measure CPU consumption, Memory consumption
Automate/standardize the test as much as possible

@harshmittalibm
Copy link

I have put my initial findings here -

https://ibm.box.com/s/l69aksjokmnwdb6u2frpd715d6537pq8

It consists of the latency comparison between different models. I will update it with latency of documentation and its accuracy.

@nichjones1 nichjones1 moved this to Backlog in Granite.Code Nov 20, 2024
@deboer-tim deboer-tim added the 📐 benchmark Benchmarking granite label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📐 benchmark Benchmarking granite
Projects
Status: Backlog
Development

No branches or pull requests

3 participants