Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to merge gguf quntizated models? #494

Open
lexasub opened this issue Jan 24, 2025 · 1 comment
Open

how to merge gguf quntizated models? #494

lexasub opened this issue Jan 24, 2025 · 1 comment

Comments

@lexasub
Copy link

lexasub commented Jan 24, 2025

does mergekit support gguf? if not, than feature Request: Support for Merging GGUF Quantized Models via MergeKit.
Currently, MergeKit doesn't natively support merging GGUF-format quantized models. This feature request proposes adding capability to merge locally stored GGUF models while preserving quantization benefits.

Use Case

Merge specialized quantized models (e.g., code+math+language)

Combine LoRA adapters with quantized base models

Avoid reconversion to FP16/F32 for merging

Proposed Solution

Add GGUF loader/writer interface

Implement quantization-aware merging:

class GGUFMerger:  
    def __init__(self, config):  
        self.quant_methods = {  
            'q4_k': self._merge_q4_k,  
            'q5_k': self._merge_q5_k  
        }  

Support mixed quantization levels with auto-upcasting

Implementation Steps

Add GGUF file loader using llama.cpp Python bindings

Create quantization-aware merging strategies

Implement memory-efficient partial loading

Add GGUF writer with quantization preservation

Configuration Example

yaml
Copy
merge_method: dare_ties  
models:  
  - model: ./codellama-34b.Q5_K.gguf  
    parameters:  
      weight: 0.6  
      quant: q5_k  
  - model: ./wizardmath-7b.Q4_K.gguf  
    parameters:  
      weight: 0.4  
      quant: q4_k  
base_model: llama  
dtype: quantized  
output_quant: q5_k_m  # Auto-convert lighter quant  
CLI Arguments
mergekit-yaml config.yaml ./merged-model.gguf \  
  --quant-method q5_k_m \  
  --gguf-context 4096 \  
  --quant-merge-strategy conservative  

Possible Challenges

Handling different quantization methods

Preserving quantization accuracy

Memory management for large merges

Additional Context
Tested with:

llama.cpp 0.8.0

MergeKit 0.3.1

GGUF v3

Community Benefit
This would enable users to:

Reduce VRAM requirements by 4-10x during merging

Maintain quantization benefits end-to-end

Combine models from different quantization sources

Request
Please 👍 if you need this feature and share your use case below.
CC: @mergekit-maintainers

@tushar-31093
Copy link

Looking for the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants