Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the VRAM requirement for 14B models? #41

Open
genkv opened this issue Feb 26, 2025 · 10 comments
Open

What is the VRAM requirement for 14B models? #41

genkv opened this issue Feb 26, 2025 · 10 comments

Comments

@genkv
Copy link

genkv commented Feb 26, 2025

In the README, we learned that the 1.3B model only requires as little as 8 GB of VRAM to run. What about the other 14B models? Any suggestions

@Jandown
Copy link

Jandown commented Feb 26, 2025

1.3B I tested and actually used 24GB of video memory

@worker121
Copy link

14B i tested, and a total of 40B*8 of gpu memory was used.

@FurkanGozukara
Copy link

14b works even on 24 gb with optimization but taking like 3 hours on rtx 3090 ti

it takes like 30 min on h200 :D used 56 gb vram in my app

#20

@Sankalan13
Copy link

1.3B actually doesn't run on 8Gb VRAM, if it does than the documentation in the readme is not uptodate. I have been on it since yesterday, trying out everything I can with my 3060 12 Gb but unable to get this to work.
@FurkanGozukara do you have no shame in using an open source model to market your Patereon account? And I also see this is a repeated behaviour in many open source communities that you are banned from. Issues are raised in open source repositories for help, if you do not intend to help people with optimisations that you have done and you hide it behind a paywall, there is no need for you to highjack posts and ask people to pay to use your app. Dude, stop this behaviour

@FurkanGozukara
Copy link

1.3b works as low as 3.5 GB VRAM - and fully it uses is around 6.5 GB so it should run perfect on 8 GB GPUs

here a step by step tutorial which i have shown with evidence : https://youtu.be/hnAhveNy-8s

@pikachurus
Copy link

pikachurus commented Mar 1, 2025

Nvidia L40 with 48Gb RAM is not enough for 14B model, even for size 480*832

but with --fp8 option (from commit #80) works fine:

Image

36.74s/it
31 minutes

@maxpaynestory
Copy link

maxpaynestory commented Mar 3, 2025

I have created with Wan2.1 on comfyui

Text2VideoWanFunnyHorse_00007.webm

on my laptop RTX 3050 TI laptop GPU

Image

@able2608
Copy link

able2608 commented Mar 5, 2025

In short, 6GB VRAM (RTX 3060 Laptop) + 16GB RAM should work for both 1.3B and 14B models, but quantized models are needed for 14B ones.

6GB VRAM + 16GB RAM is definitely enough to run the 1.3B T2V DiT model comfortably without quantization (clip, VAE and DiT weights straight from the original repo without modification). I have succeeded in generating videos with size of 512x768x121 (would take a while to generate like 20 minutes or so, smaller size runs a lot faster down to a few minutes. have tested 512x768x161 but I forgot how long it took), but you might need to use tiled VAE decoding.
For 14B models, I have successfully ran with the same setup using Q3_K_S quant of the DiT model with size of 512x512x33 and 384x512x69 (taking around 15 minutes per generation). The I2V model seems to work fine even with such amount of quantization, but I have heard that quantized T2V models gave bad results.

BTW I am using ComfyUI native implementation (with sage attention) on Windows for all the experiments. This repo might take up more resources as it is not as optimized as ComfyUI.

@maxpaynestory
Copy link

maxpaynestory commented Mar 5, 2025

@able2608

This repo might take up more resources as it is not as optimized as ComfyUI.

Does ComfyUI, Diffusers, Wan2.1 repo has different code? If so, which one has the best optimized code?

@able2608
Copy link

able2608 commented Mar 5, 2025

AFAIK ComfyUI comes with its own implementation and memory management system and (the Comfy native implementation) does not use diffusers as backend. I'm not pretty sure about the difference between the implementation of this repo and Diffusers, but they are probably quite different from ComfyUI's anyways.
For now (I think) ComfyUI comes with the best optimization out of the box and works most of the time without messing around too much. You might be able to get similar levels of optimization (or even more) on diffuseres or official repo, but further tweaking might be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants