Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add macOS Compatibility #69

Open
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

bakhti-ai
Copy link

Overview

This pull request introduces compatibility improvements for running the Wan2.1 text-to-video model on macOS systems with M1 Pro chips. It also includes enhancements to the documentation to assist macOS users in setting up and using the model effectively.

Key Changes

  1. MPS Compatibility: Adapted CUDA-specific code to work with Metal Performance Shaders (MPS) on macOS, allowing the model to run on M1 Pro chips.
  2. Environment Variable for Fallback: Implemented the use of PYTORCH_ENABLE_MPS_FALLBACK=1 to enable CPU fallback for operations not supported by MPS.
  3. Command-Line Adjustments: Modified command-line arguments to improve compatibility and performance on macOS.
  4. Documentation Updates: Enhanced the README with detailed installation instructions, usage examples, and optimization tips specifically for macOS users.

Benefits

  • Broader Accessibility: Enables macOS users, particularly those with M1 Pro chips, to utilize the Wan2.1 model without encountering CUDA-related issues.
  • Improved User Experience: Provides clear guidance and best practices for setting up and running the model on macOS, reducing setup time and potential errors.
  • Community Contribution: Shares valuable insights and solutions with the community, potentially benefiting other users facing similar challenges.

Testing

The changes have been tested on a MacBook Pro with an M1 Pro chip, ensuring that the model runs smoothly with the specified configurations.

Additional Notes

  • Users are encouraged to monitor system resources and adjust parameters as needed to optimize performance and memory usage.
  • Feedback and further suggestions for improvement are welcome.

WanX-Video-1 and others added 11 commits February 25, 2025 22:54
…o#44)

* Update text2video.py to reduce GPU memory by emptying cache

If offload_model is set, empty_cache() must be called after the model is moved to CPU to actually free the GPU. I verified on a RTX 4090 that without calling empty_cache the model remains in memory and the subsequent vae decoding never finishes.

* Update text2video.py only one empty_cache needed before vae decode
@bakhti-ai
Copy link
Author

Hi @WanX-Video-1,

I hope you're doing well. I've made some changes to adapt the Wan2.1 text-to-video model for macOS with M1 Pro chips. The key changes include:

  • Compatibility improvements for MPS on macOS.
  • Documentation updates for macOS setup and usage.
  • Command-line adjustments for better performance on macOS.

Could you please review the pull request when you have a moment? Your feedback would be greatly appreciated.

Thank you!

Best regards,
Bakhtiyor

@lorihuang
Copy link

Thank you for your help, but I encountered the following error during the program execution. Could you please let me know how to resolve it? Thank you!
"""
python(16280) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.
0%| | 0/25 [00:00<?, ?it/s]
"""

@bakhti-ai
Copy link
Author

Thank you for your help, but I encountered the following error during the program execution. Could you please let me know how to resolve it? Thank you! """ python(16280) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 0%| | 0/25 [00:00<?, ?it/s] """

This is just a warning message from macOS. I also encountered it.
It doesn't impact the program's execution
The progress bar that follows indicates the program is working as expected
You can safely ignore this message and just wait the progress bar go to 100%

@lorihuang
Copy link

Thank you for your help, but I encountered the following error during the program execution. Could you please let me know how to resolve it? Thank you! """ python(16280) MallocStackLogging: can't turn off malloc stack logging because it was not enabled. 0%| | 0/25 [00:00<?, ?it/s] """

This is just a warning message from macOS. I also encountered it. It doesn't impact the program's execution The progress bar that follows indicates the program is working as expected You can safely ignore this message and just wait the progress bar go to 100%

got it ! ths a lot

@TreasureJade
Copy link

I have successfully run your solution, thank you ^ ^

g7adrian and others added 7 commits February 27, 2025 11:38
…o#44)

* Update text2video.py to reduce GPU memory by emptying cache

If offload_model is set, empty_cache() must be called after the model is moved to CPU to actually free the GPU. I verified on a RTX 4090 that without calling empty_cache the model remains in memory and the subsequent vae decoding never finishes.

* Update text2video.py only one empty_cache needed before vae decode
Add model files download step
@bakhti-ai bakhti-ai force-pushed the macos-compatibility branch 4 times, most recently from f8cca94 to 2beb726 Compare February 27, 2025 07:09
@Volutionn
Copy link

Thanks @bakhti-uzb ! I was working on the same PR but ran into this error on my M4 Max:
RuntimeError: Input type (MPSFloatType) and weight type (MPSHalfType) should be the same

I'm hitting the same error on your PR, have you run into this?

@bakhti-ai
Copy link
Author

Thanks @bakhti-uzb ! I was working on the same PR but ran into this error on my M4 Max: RuntimeError: Input type (MPSFloatType) and weight type (MPSHalfType) should be the same

I'm hitting the same error on your PR, have you run into this?

I haven't encountered that exact error on my M1 Pro, but it's related to tensor data type mismatches on the MPS backend. The model is trying to use half-precision weights (FP16) with full-precision inputs (FP32).
You could try a few approaches:

  1. Force everything to use the same precision by adding this before model loading:
torch.mps.set_per_process_memory_fraction(0.8)  # Optional but helpful
torch.set_default_dtype(torch.float32)  # Force full precision
  1. Alternatively, if you prefer half precision for memory efficiency:
    torch.set_default_dtype(torch.float16)

  2. You might need to explicitly convert tensors to match types in the model code. Look for where tensors are being sent to the MPS device and ensure consistent typing.

Let me know if you've found a solution that works consistently - I'd be happy to incorporate it into the PR to make it work better across different Apple Silicon chips.

@Volutionn
Copy link

Thanks @bakhti-uzb, I managed to make it work. The issue was that I was trying to run the I2V while you focused on the T2V. Here's the change I made to get the I2V working:

  • image2video.py: I removed the cpu() in img[None].cpu() to have the img use the same device when using MPS
  • wan_i2v_14B.py: I changed this line i2v_14B.clip_dtype = torch.float16 to use float32

Copy link

@tanmay1100 tanmay1100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  File "/Users/me/Wan2.1/wan/modules/attention.py", line 186, in attention
    out = torch.nn.functional.scaled_dot_product_attention(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 47.98 GB

This pops up after making all these changes, on m3 pro.

@bakhti-ai
Copy link
Author

Thanks @bakhti-uzb, I managed to make it work. The issue was that I was trying to run the I2V while you focused on the T2V. Here's the change I made to get the I2V working:

  • image2video.py: I removed the cpu() in img[None].cpu() to have the img use the same device when using MPS
  • wan_i2v_14B.py: I changed this line i2v_14B.clip_dtype = torch.float16 to use float32

Thanks for letting us know. I included this changes in the repo

@bakhti-ai
Copy link
Author

  File "/Users/me/Wan2.1/wan/modules/attention.py", line 186, in attention
    out = torch.nn.functional.scaled_dot_product_attention(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 47.98 GB

This pops up after making all these changes, on m3 pro.

It would be easy to help if you provide some more information like how you are trying to generate with what kind of options (frame_num etc.)

For now I can suggest these:

  1. Lower resolution: Suggest they try a smaller video size like --size "320*576" instead of your current settings.
  2. Reduce frame count: Use fewer frames with --frame_num 8
  3. Increase memory efficiency:
    export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
  4. Use CPU mode: While slower, it would avoid the memory limitation:
    python generate.py --task t2v-1.3B --device cpu [other parameters]
  5. Set explicit memory fraction:
#Add this somewhere at the top of the script
import torch
torch.mps.set_per_process_memory_fraction(0.7)  # Adjust value as needed

@tanmay1100
Copy link

tanmay1100 commented Feb 27, 2025

  File "/Users/me/Wan2.1/wan/modules/attention.py", line 186, in attention
    out = torch.nn.functional.scaled_dot_product_attention(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 47.98 GB

This pops up after making all these changes, on m3 pro.

It would be easy to help if you provide some more information like how you are trying to generate with what kind of options (frame_num etc.)

For now I can suggest these:

  1. Lower resolution: Suggest they try a smaller video size like --size "320*576" instead of your current settings.
  2. Reduce frame count: Use fewer frames with --frame_num 8
  3. Increase memory efficiency:
    export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
  4. Use CPU mode: While slower, it would avoid the memory limitation:
    python generate.py --task t2v-1.3B --device cpu [other parameters]
  5. Set explicit memory fraction:
#Add this somewhere at the top of the script
import torch
torch.mps.set_per_process_memory_fraction(0.7)  # Adjust value as needed

not explicitly specifying frame numbers, just copied their example from readme:

python generate.py  --task t2v-1.3B --size "832*480" --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

Did this: export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8

Now it throws: RuntimeError: invalid low watermark ratio 1.4

Also set the torch.mps.set_per_process_memory_fraction(0.7)
Throws:

 File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/mps/__init__.py", line 108, in set_per_process_memory_fraction
   torch._C._mps_setMemoryFraction(fraction)`
RuntimeError: invalid low watermark ratio 1.4

Setting the watermark ratio back to 0, and running this command:

python generate.py  --task t2v-1.3B --size "832*480" --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --frame_num 5 --offload_model True --t5_cpu --device "cpu"

throws:

  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 610, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 605, in _conv_forward
    return F.conv3d(
           ^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: MPS backend out of memory (MPS allocated: 6.11 GB, other allocations: 2.27 GB, max allowed: 8.40 GB). Tried to allocate 73.12 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

@bakhti-ai
Copy link
Author

  File "/Users/me/Wan2.1/wan/modules/attention.py", line 186, in attention
    out = torch.nn.functional.scaled_dot_product_attention(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 47.98 GB

This pops up after making all these changes, on m3 pro.

It would be easy to help if you provide some more information like how you are trying to generate with what kind of options (frame_num etc.)
For now I can suggest these:

  1. Lower resolution: Suggest they try a smaller video size like --size "320*576" instead of your current settings.
  2. Reduce frame count: Use fewer frames with --frame_num 8
  3. Increase memory efficiency:
    export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
  4. Use CPU mode: While slower, it would avoid the memory limitation:
    python generate.py --task t2v-1.3B --device cpu [other parameters]
  5. Set explicit memory fraction:
#Add this somewhere at the top of the script
import torch
torch.mps.set_per_process_memory_fraction(0.7)  # Adjust value as needed

not explicitly specifying frame numbers, just copied their example from readme:

python generate.py  --task t2v-1.3B --size "832*480" --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

Did this: export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8

Now it throws: RuntimeError: invalid low watermark ratio 1.4

Also set the torch.mps.set_per_process_memory_fraction(0.7) Throws:

 File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/mps/__init__.py", line 108, in set_per_process_memory_fraction
   torch._C._mps_setMemoryFraction(fraction)`
RuntimeError: invalid low watermark ratio 1.4

Setting the watermark ratio back to 0, and running this command:

python generate.py  --task t2v-1.3B --size "832*480" --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --frame_num 5 --offload_model True --t5_cpu --device "cpu"

throws:

  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 610, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 605, in _conv_forward
    return F.conv3d(
           ^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: MPS backend out of memory (MPS allocated: 6.11 GB, other allocations: 2.27 GB, max allowed: 8.40 GB). Tried to allocate 73.12 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

It looks like you're encountering a series of memory-related issues that are common when running large AI models on Mac.

Here's what I recommend trying:

  1. Combined approach - try this exact command sequence:

    # First unset any existing memory settings
    unset PYTORCH_MPS_HIGH_WATERMARK_RATIO
    
    # Then run with these specific settings
    python generate.py --task t2v-1.3B --size "416*256" --frame_num 4 --sample_steps 10 --ckpt_dir ../Wan2.1-T2V-1.3B --offload_model True --t5_cpu --device cpu --prompt "Two anthropomorphic cats in comfy boxing gear"
  2. Reduce complexity all around:

    • Use a significantly smaller resolution
    • Reduce frame count to minimum (4)
    • Reduce the prompt length
    • Reduce sample steps
  3. Memory management in Python - add these at the top of your script:

    import gc
    import torch
    
    # Force garbage collection
    gc.collect()
    torch.cuda.empty_cache() if torch.cuda.is_available() else None
  4. Close other applications - Make sure you don't have other memory-intensive apps running

The key insight is that when using large models, especially on Mac, you typically need to be much more conservative with generation parameters than the default examples suggest. Start with minimal settings that work, then gradually increase until you find your system's limit.

@tanmay1100
Copy link

  File "/Users/me/Wan2.1/wan/modules/attention.py", line 186, in attention
    out = torch.nn.functional.scaled_dot_product_attention(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Invalid buffer size: 47.98 GB

This pops up after making all these changes, on m3 pro.

It would be easy to help if you provide some more information like how you are trying to generate with what kind of options (frame_num etc.)
For now I can suggest these:

  1. Lower resolution: Suggest they try a smaller video size like --size "320*576" instead of your current settings.
  2. Reduce frame count: Use fewer frames with --frame_num 8
  3. Increase memory efficiency:
    export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
  4. Use CPU mode: While slower, it would avoid the memory limitation:
    python generate.py --task t2v-1.3B --device cpu [other parameters]
  5. Set explicit memory fraction:
#Add this somewhere at the top of the script
import torch
torch.mps.set_per_process_memory_fraction(0.7)  # Adjust value as needed

not explicitly specifying frame numbers, just copied their example from readme:

python generate.py  --task t2v-1.3B --size "832*480" --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."

Did this: export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.8
Now it throws: RuntimeError: invalid low watermark ratio 1.4
Also set the torch.mps.set_per_process_memory_fraction(0.7) Throws:

 File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/mps/__init__.py", line 108, in set_per_process_memory_fraction
   torch._C._mps_setMemoryFraction(fraction)`
RuntimeError: invalid low watermark ratio 1.4

Setting the watermark ratio back to 0, and running this command:

python generate.py  --task t2v-1.3B --size "832*480" --ckpt_dir ../Wan2.1-T2V-1.3B --sample_shift 8 --sample_guide_scale 6 --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." --frame_num 5 --offload_model True --t5_cpu --device "cpu"

throws:

  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 610, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 605, in _conv_forward
    return F.conv3d(
           ^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.12/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: MPS backend out of memory (MPS allocated: 6.11 GB, other allocations: 2.27 GB, max allowed: 8.40 GB). Tried to allocate 73.12 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

It looks like you're encountering a series of memory-related issues that are common when running large AI models on Mac.

Here's what I recommend trying:

  1. Combined approach - try this exact command sequence:

    # First unset any existing memory settings
    unset PYTORCH_MPS_HIGH_WATERMARK_RATIO
    
    # Then run with these specific settings
    python generate.py --task t2v-1.3B --size "416*256" --frame_num 4 --sample_steps 10 --ckpt_dir ../Wan2.1-T2V-1.3B --offload_model True --t5_cpu --device cpu --prompt "Two anthropomorphic cats in comfy boxing gear"
  2. Reduce complexity all around:

    • Use a significantly smaller resolution
    • Reduce frame count to minimum (4)
    • Reduce the prompt length
    • Reduce sample steps
  3. Memory management in Python - add these at the top of your script:

    import gc
    import torch
    
    # Force garbage collection
    gc.collect()
    torch.cuda.empty_cache() if torch.cuda.is_available() else None
  4. Close other applications - Make sure you don't have other memory-intensive apps running

The key insight is that when using large models, especially on Mac, you typically need to be much more conservative with generation parameters than the default examples suggest. Start with minimal settings that work, then gradually increase until you find your system's limit.

generate.py: error: argument --size: invalid choice: '416*256' (choose from '720*1280', '1280*720', '480*832', '832*480', '1024*1024')

it doesn't let me go below 480p

@agentx-cgn
Copy link

Still get MPS backend out of memory with --device cpu, but did generate video, just tripped at the end.

@MrShakila
Copy link

can we use this model in m3 Pro chip as well ?

@bakhti-ai
Copy link
Author

can we use this model in m3 Pro chip as well ?

This should probably work, but I haven't tested it specifically on M3 Pro, I ran it on my M1 Pro and it worked. Let me know if you have any errors while trying it out

@HighDoping
Copy link

Still get MPS backend out of memory with --device cpu, but did generate video, just tripped at the end.

You may try my fork, which will save few GB of RAM.

can we use this model in m3 Pro chip as well ?

32GB M4 runs T2V-1.3B fine.

@chikiuso
Copy link

chikiuso commented Mar 3, 2025

Hi @bakhti-ai , thanks for your contribution to the Mac mod version, I just tried to install on my MacBook but it will kill the process itself after loading and creating pipeline and model, my MacBook is M2 and just having 8Gb Ram, would that be the reason it doesn't run? I already am using the 480p model, thanks!!

@bakhti-ai
Copy link
Author

Hi @bakhti-ai , thanks for your contribution to the Mac mod version, I just tried to install on my MacBook but it will kill the process itself after loading and creating pipeline and model, my MacBook is M2 and just having 8Gb Ram, would that be the reason it doesn't run? I already am using the 480p model, thanks!!

Even if you use 480p video generation models are quite memory-intensive. Yes I think you are having out of memory issue. Can you provide any error message, warning or any logs from running it on your machine?

@bakhti-ai bakhti-ai changed the title Add macOS M1 Pro Compatibility and Documentation Enhancements Add macOS Compatibility Mar 4, 2025
@chklovski
Copy link

chklovski commented Mar 5, 2025

Using --frame_num over 45 leads to: Error: total bytes of NDArray > 2**32'

This is on a 128 GB M3 Max, with the options --task t2v-1.3B --size "480*832". Any idea whether this can be fixed?

@bakhti-ai
Copy link
Author

Using --frame_num over 45 leads to: Error: total bytes of NDArray > 2**32'

This is on a 128 GB M3 Max, with the options --task t2v-1.3B --size "480*832". Any idea whether this can be fixed?

I am not sure if this will work. I asked chatgpt about this issue and here its response:

Why This Happens

  • NumPy stores large tensors as arrays, and by default, it limits array sizes to 4GB (2^32 bytes) when using 32-bit indexing.

  • Since the code converts NumPy arrays to PyTorch tensors (torch.from_numpy(...)), a large frame count (45+) at 480×832 resolution can exceed this limit.

  • NumPy’s default array dtype (float64) makes this worse because it uses 8 bytes per value, quickly inflating memory usage.

How to Fix It
✅ 1. Use float32 Instead of float64
Modify any instance where NumPy arrays are created and force float32 instead of defaulting to float64.

For example, in fm_solvers.py and fm_solvers_unipc.py, update:

sigmas = torch.from_numpy(sigmas.astype(np.float32)).to(dtype=torch.float32)
self.sigmas = torch.from_numpy(sigmas.astype(np.float32))
self.timesteps = torch.from_numpy(timesteps.astype(np.float32))

This reduces memory usage by 50%.

✅ 2. Convert NumPy Arrays to PyTorch Tensors Earlier
Instead of handling large NumPy arrays, move calculations to PyTorch before NumPy gets too large.

For example, in utils.py:
for frame in tensor.numpy():
Change it to:
for frame in tensor.detach().cpu().float().numpy():
This ensures the tensor is in float32 before NumPy processes it.

✅ 3. Check NumPy’s allow_large_arrays Flag (Not Always Available)
Some versions of NumPy support large arrays, but it depends on platform and settings. Try:

np.seterr(over='ignore')
np.set_printoptions(threshold=np.inf)

If errors persist, upgrading NumPy might help:
pip install --upgrade numpy

Final Verdict
✔ Yes, the error is due to NumPy’s 4GB per-array limit.
✔ Fix it by forcing float32 and using PyTorch tensors earlier.
✔ Reducing frame_num is a temporary workaround, but not a real fix.

@HighDoping
Copy link

Using --frame_num over 45 leads to: Error: total bytes of NDArray > 2**32'

This is on a 128 GB M3 Max, with the options --task t2v-1.3B --size "480*832". Any idea whether this can be fixed?

Haven't got enough RAM to test it, but may due to MPSNDArray limit. pytorch/pytorch#134177 May be solved by breaking up the array in generation process or try new software version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.