Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MODULE] A module on quantization #169

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

michaelshekasta
Copy link

@michaelshekasta michaelshekasta commented Jan 12, 2025

I’d like to propose a new module aimed at optimizing language models for efficient CPU-based inference, reducing reliance on GPUs. The module covers three key areas: quantization techniques, the GGUF model format, and utilizing Intel and MLX accelerators for optimized inference.

What are you thinking?

@michaelshekasta michaelshekasta marked this pull request as draft January 12, 2025 15:03
@burtenshaw
Copy link
Collaborator

Hi @michaelshekasta . Sorry to go quiet on this. I've been wrapped up on an agents course for HF learn this week. I will review it tomorrow.

@michaelshekasta
Copy link
Author

@burtenshaw a gentle reminder

@burtenshaw burtenshaw changed the title Draft!! Quantization [MODULE] A module on quantization Jan 16, 2025
@burtenshaw
Copy link
Collaborator

@michaelshekasta This is a great start. I've implemented a more typical structure. I would suggest that you now follow on with next stage:

  • find references for each section of the module.
  • add them to the references section of the markdown pages.
  • add bullet point note to each section of the page with key topics.
  • highlight sections that you don't understand or need help on

Once you're ready, I'll review and complete the module's prose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants