Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update attention.py #116

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

shirubei
Copy link

Adding support for cards that aren't Ampere architecture

Adding support for cards that aren't Ampere architecture
@splendiz
Copy link

splendiz commented Mar 1, 2025

Thanks shirubei, I tried the code with my 8*2080ti.
However, I got the error argument "TypeError: attention() got an unexpected keyword argument 'version'" for each of the graphic card. The errors result in "torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ".
Any solutions?

@jimbojd72
Copy link

python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 
[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.
[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-01 23:17:35,481] INFO: Generating video ...
  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate
    noise_pred_cond = self.model(
                      ~~~~~~~~~~^
        latent_model_input, t=timestep, **arg_c)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward
    x = block(x, **kwargs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward
    y = self.self_attn(
        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,
        freqs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward
    k=rope_apply(k, grid_sizes, freqs),
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply
    return torch.stack(output).float()
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

@splendiz
Copy link

splendiz commented Mar 2, 2025


python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 

[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)

[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}

[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.

[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.

[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth

[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth

[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B

[2025-03-01 23:17:35,481] INFO: Generating video ...

  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]

Traceback (most recent call last):

  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>

    generate(args)

    ~~~~~~~~^^^^^^

  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate

    video = wan_t2v.generate(

        args.prompt,

    ...<6 lines>...

        seed=args.base_seed,

        offload_model=args.offload_model)

  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate

    noise_pred_cond = self.model(

                      ~~~~~~~~~~^

        latent_model_input, t=timestep, **arg_c)[0]

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward

    x = block(x, **kwargs)

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward

    y = self.self_attn(

        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,

        freqs)

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    return self._call_impl(*args, **kwargs)

           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    return forward_call(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward

    k=rope_apply(k, grid_sizes, freqs),

      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^

  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast

    return func(*args, **kwargs)

  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply

    return torch.stack(output).float()

           ~~~~~~~~~~~~~~~~~~~~~~~~~^^

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

Same here, for single 2080ti, i got OOM as well. Multiple GPUs not working. :(

@shirubei
Copy link
Author

shirubei commented Mar 2, 2025

python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                                                     (Wan2.1)  ✔  󰌠 3.13.2  23:16:08 
[2025-03-01 23:16:36,760] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=81, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=9018256661711622796, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-01 23:16:36,760] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-01 23:16:36,760] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-01 23:16:36,760] INFO: Creating WanT2V pipeline.
[2025-03-01 23:17:19,381] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-01 23:17:32,059] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-01 23:17:32,481] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-01 23:17:35,481] INFO: Generating video ...
  0%|                                                                                                                                                                                                                                                                                                                                                                                                               | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 236, in generate
    noise_pred_cond = self.model(
                      ~~~~~~~~~~^
        latent_model_input, t=timestep, **arg_c)[0]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 564, in forward
    x = block(x, **kwargs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 298, in forward
    y = self.self_attn(
        self.norm1(x).float() * (1 + e[1]) + e[0], seq_lens, grid_sizes,
        freqs)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 148, in forward
    k=rope_apply(k, grid_sizes, freqs),
      ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/mnt/data/AI/Wan2.1/wan/modules/model.py", line 67, in rope_apply
    return torch.stack(output).float()
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 0 has a total capacity of 10.57 GiB of which 244.06 MiB is free. Including non-PyTorch memory, this process has 8.11 GiB memory in use. Of the allocated memory 7.52 GiB is allocated by PyTorch, and 421.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I got further with your branch but now getting OOM issues on my VRAM. Good try though

you can go with --frame_num 17 or even less to test the modification, such as 13 or 9.

@shirubei
Copy link
Author

shirubei commented Mar 2, 2025

@splendiz @jimbojd72
I run cmd like below with 2080ti 22GB

python generate.py --task t2v-1.3B --size 832*480 --ckpt_dir
./Wan2.1-T2V-1.3B --offload_model False --frame_num 13 --sample_shift 8 --sample_guide_scale 6 --prompt "Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting."

image

@shirubei
Copy link
Author

shirubei commented Mar 2, 2025

Finally video file was created.
image

t2v-1.3B_1_Ultra-wide_angle._night._a_young_girl_in_a_red_dre_20250303_001358.mp4

@jimbojd72
Copy link

Getting the same error with your new prompt. I don't know enough for now to debug on my side (first time trying a model on my Arch with 2080 Ti).

My point here is that I should not be a blocker for merging if it doesn't work on my setup.

    /mnt/data/AI/Wan2.1   main !1 ?1  python generate.py --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model False --frame_num 13 --sample_shift 8 --sample_guide_scale 6 --prompt "Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting."
[2025-03-02 11:51:36,833] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=13, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=False, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=False, dit_fsdp=False, save_file=None, prompt='Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=1674726556309183157, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-02 11:51:36,833] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-02 11:51:36,833] INFO: Input prompt: Ultra-wide angle, night, a young girl in a red dress walks towards the camera from a distance on a noisy street. On both sides of the road, there are continuous shops with soft lighting.
[2025-03-02 11:51:36,833] INFO: Creating WanT2V pipeline.
[2025-03-02 11:52:28,223] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-02 11:52:36,409] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-02 11:52:36,689] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-02 11:52:39,375] INFO: Generating video ...
Traceback (most recent call last):
  File "/mnt/data/AI/Wan2.1/generate.py", line 411, in <module>
    generate(args)
    ~~~~~~~~^^^^^^
  File "/mnt/data/AI/Wan2.1/generate.py", line 313, in generate
    video = wan_t2v.generate(
        args.prompt,
    ...<6 lines>...
        seed=args.base_seed,
        offload_model=args.offload_model)
  File "/mnt/data/AI/Wan2.1/wan/text2video.py", line 171, in generate
    self.text_encoder.model.to(self.device)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1343, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 903, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 930, in _apply
    param_applied = fn(param)
  File "/mnt/data/miniconda3/envs/Wan2.1/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1329, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.96 GiB. GPU 0 has a total capacity of 10.57 GiB of which 2.01 GiB is free. Including non-PyTorch memory, this process has 6.28 GiB memory in use. Of the allocated memory 5.78 GiB is allocated by PyTorch, and 335.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

@shirubei
Copy link
Author

shirubei commented Mar 3, 2025

@jimbojd72
Does your 2080ti come with 22GB memory?
You can see that frame_num=13 without t5_cpu will consume 21.6 GB of graphic memory and 2 GB of shared memory(must be shared from ordinary memory) from the screen image captured .
So if you own a 11GB 2080ti , I suggest using frame_num 5 to test the code.
Maybe it's better to add --t5_cpu parameter.

Thank you.

@jimbojd72
Copy link

11GB

Now I understand the link between frame_num and vram 🤦🏻 . Thanks for clarifying that for me.

It did work afterward even though the video seems too short!

    /mnt/data/AI/Wan2.1   main !1 ?1  python generate.py  --task t2v-1.3B --size '832*480' --ckpt_dir ./Wan2.1-T2V-1.3B --offload_model True --t5_cpu --sample_shift 8 --sample_guide_scale 6 --frame_num 5 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."                                                            (Wan2.1)  ✔   16s  󰌠 3.13.2  23:15:48 
[2025-03-02 23:15:51,906] INFO: Generation job args: Namespace(task='t2v-1.3B', size='832*480', frame_num=5, ckpt_dir='./Wan2.1-T2V-1.3B', offload_model=True, ulysses_size=1, ring_size=1, t5_fsdp=False, t5_cpu=True, dit_fsdp=False, save_file=None, prompt='Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.', use_prompt_extend=False, prompt_extend_method='local_qwen', prompt_extend_model=None, prompt_extend_target_lang='ch', base_seed=5029517465784891079, image=None, sample_solver='unipc', sample_steps=50, sample_shift=8.0, sample_guide_scale=6.0)
[2025-03-02 23:15:51,906] INFO: Generation model config: {'__name__': 'Config: Wan T2V 1.3B', 't5_model': 'umt5_xxl', 't5_dtype': torch.bfloat16, 'text_len': 512, 'param_dtype': torch.bfloat16, 'num_train_timesteps': 1000, 'sample_fps': 16, 'sample_neg_prompt': '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 't5_checkpoint': 'models_t5_umt5-xxl-enc-bf16.pth', 't5_tokenizer': 'google/umt5-xxl', 'vae_checkpoint': 'Wan2.1_VAE.pth', 'vae_stride': (4, 8, 8), 'patch_size': (1, 2, 2), 'dim': 1536, 'ffn_dim': 8960, 'freq_dim': 256, 'num_heads': 12, 'num_layers': 30, 'window_size': (-1, -1), 'qk_norm': True, 'cross_attn_norm': True, 'eps': 1e-06}
[2025-03-02 23:15:51,906] INFO: Input prompt: Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage.
[2025-03-02 23:15:51,906] INFO: Creating WanT2V pipeline.
[2025-03-02 23:16:35,034] INFO: loading ./Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth
[2025-03-02 23:16:44,671] INFO: loading ./Wan2.1-T2V-1.3B/Wan2.1_VAE.pth
[2025-03-02 23:16:44,968] INFO: Creating WanModel from ./Wan2.1-T2V-1.3B
[2025-03-02 23:16:47,722] INFO: Generating video ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:45<00:00,  3.30s/it]
[2025-03-02 23:24:00,482] INFO: Saving generated video to t2v-1.3B_832*480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4
[2025-03-02 23:24:00,961] INFO: Finished.
t2v-1.3B_832.480_1_1_Two_anthropomorphic_cats_in_comfy_boxing_gear_and__20250302_232400.mp4

@qianzhouyi2
Copy link

I have tested this.It need frame_num under about 12 to make sure 2080ti 22G run smoothly.It cost 5min to generate a 832480 video
python generate.py --task t2v-1.3B --size 832
480 --ckpt_dir ./Wan2.1-T2V-1.3B --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --frame_num 12

@splendiz
Copy link

splendiz commented Mar 6, 2025

single gpu may work, how about multi gpus? Anyone tested the code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants