You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
does mergekit support gguf? if not, than feature Request: Support for Merging GGUF Quantized Models via MergeKit.
Currently, MergeKit doesn't natively support merging GGUF-format quantized models. This feature request proposes adding capability to merge locally stored GGUF models while preserving quantization benefits.
does mergekit support gguf? if not, than feature Request: Support for Merging GGUF Quantized Models via MergeKit.
Currently, MergeKit doesn't natively support merging GGUF-format quantized models. This feature request proposes adding capability to merge locally stored GGUF models while preserving quantization benefits.
Use Case
Merge specialized quantized models (e.g., code+math+language)
Combine LoRA adapters with quantized base models
Avoid reconversion to FP16/F32 for merging
Proposed Solution
Add GGUF loader/writer interface
Implement quantization-aware merging:
Support mixed quantization levels with auto-upcasting
Implementation Steps
Add GGUF file loader using llama.cpp Python bindings
Create quantization-aware merging strategies
Implement memory-efficient partial loading
Add GGUF writer with quantization preservation
Configuration Example
Possible Challenges
Handling different quantization methods
Preserving quantization accuracy
Memory management for large merges
Additional Context
Tested with:
llama.cpp 0.8.0
MergeKit 0.3.1
GGUF v3
Community Benefit
This would enable users to:
Reduce VRAM requirements by 4-10x during merging
Maintain quantization benefits end-to-end
Combine models from different quantization sources
Request
Please 👍 if you need this feature and share your use case below.
CC: @mergekit-maintainers
The text was updated successfully, but these errors were encountered: