Skip to content

A script to estimate the memory requirements of an LLM during training, finetuning or inference

Notifications You must be signed in to change notification settings

AndreaPi/llm-memory-calculator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

llm-memory-calculator

A very simple calculator to estimate the GPU memory usage of a 🤗 Transformers model, according to its configuration file. The script estimates memory requirements respectively for training, finetuning or inference.

⚠️ nearly untested: use at your own risk. PRs welcome!

Basic usage:

python llm_memory_calculator.py config.json --mode train --batch_size 32 --seq_length 512

Features:

  • Supports training, finetuning, and inference calculations
  • Accounts for different precisions (float32, float16, bfloat16, int8, fp8)
  • Includes memory optimizations like gradient checkpointing
  • Supports LoRA for finetuning estimates
  • Considers KV cache for inference

Advanced usage:

# Training with gradient checkpointing
python llm_memory_calculator.py config.json --mode train --batch_size 32 --seq_length 512 --gradient_checkpointing --dtype float16

# Finetuning with LoRA
python llm_memory_calculator.py config.json --mode finetune --batch_size 16 --seq_length 512 --lora_rank 8 --dtype float16

# Inference with KV cache
python llm_memory_calculator.py config.json --mode inference --batch_size 1 --seq_length 1024 --kv_cache --dtype float16

NOTE: the calculator just returns an estimate. The actual memory usage may vary depending on implementation details of the framework, memory fragmentation, system overhead, specific optimizations used, etc.

About

A script to estimate the memory requirements of an LLM during training, finetuning or inference

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages