PyTorch native quantization and sparsity for training and inference
training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8
-
Updated
Jan 4, 2025 - Python