You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Perhaps this is hopeless but I thought it would be worth asking. I know I am close to the memory limit. Is there still hope to fit the model on 3 M2U with 128GB RAM? The 4th node is being used on another project and was curious if it would fit in 3 nodes.
I have used sudo sysctl -w iogpu.wired_limit_mb=122000 on all three nodes.
The code is running and I can see the memory increasing but then it fails apparently before it completes loading the model because at the time of failure it is still 100% CPU utilization.
/opt/homebrew/bin/mpirun --mca oob_tcp_if_include bridge0 --mca btl_tcp_if_include bridge0 \
--map-by ppr:1:node --mca coll_tuned_use_dynamic_rules 1 \
--mca coll_tuned_allreduce_algorithm 5 --mca btl_tcp_links 4 \
--mca mpi_thread_multiple 0 --mca btl_tcp_eager_limit 4194304 \
--mca btl_tcp_sndbuf 8388608 --mca btl_tcp_rcvbuf 8388608 --mca btl self,tcp \
-x DYLD_LIBRARY_PATH=/opt/homebrew/lib/ \
-np 3 --host 10.0.0.1:1,10.0.0.3:1,localhost:1 \
/Users/m2/anaconda3/envs/pythonProject_StreamLit/bin/python /Users/m2/pipeline_generate.py \
--model /Volumes/PACIFIC-GROVE/DeepSeek-R1-3bit \
--prompt "What's better a straight or a flush in texas hold'em?" \
--max-tokens 1024
[WARNING] Generating with a model that requires 116168 MB which is close to the maximum recommended size of 122000 MB. This can be slow. See the documentation for possible work-arounds: https://github.com/ml-explore/mlx-examples/tree/main/llms#large-models
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Caused GPU Timeout Error (00000002:kIOGPUCommandBufferCallbackErrorTimeout)
[Mendocino:28151] *** Process received signal ***
[Mendocino:28151] Signal: Abort trap: 6 (6)
[Mendocino:28151] Signal code: (0)
[Mendocino:28151] [ 0] 0 libsystem_platform.dylib 0x000000019e542e04 _sigtramp + 56
[Mendocino:28151] [ 1] 0 libsystem_pthread.dylib 0x000000019e50bf70 pthread_kill + 288
[Mendocino:28151] [ 2] 0 libsystem_c.dylib 0x000000019e418908 abort + 128
[Mendocino:28151] [ 3] 0 libc++abi.dylib 0x000000019e4c244c _ZN10__cxxabiv130__aligned_malloc_with_fallbackEm + 0
[Mendocino:28151] [ 4] 0 libc++abi.dylib 0x000000019e4b0a24 _ZL28demangling_terminate_handlerv + 320
[Mendocino:28151] [ 5] 0 libobjc.A.dylib 0x000000019e1593f4 _ZL15_objc_terminatev + 172
[Mendocino:28151] [ 6] 0 libc++abi.dylib 0x000000019e4c1710 _ZSt11__terminatePFvvE + 16
[Mendocino:28151] [ 7] 0 libc++abi.dylib 0x000000019e4c16b4 _ZSt9terminatev + 108
[Mendocino:28151] [ 8] 0 libdispatch.dylib 0x000000019e359688 _dispatch_client_callout4 + 40
[Mendocino:28151] [ 9] 0 libdispatch.dylib 0x000000019e375c88 _dispatch_mach_msg_invoke + 464
[Mendocino:28151] [10] 0 libdispatch.dylib 0x000000019e360a38 _dispatch_lane_serial_drain + 352
[Mendocino:28151] [11] 0 libdispatch.dylib 0x000000019e3769dc _dispatch_mach_invoke + 456
[Mendocino:28151] [12] 0 libdispatch.dylib 0x000000019e360a38 _dispatch_lane_serial_drain + 352
[Mendocino:28151] [13] 0 libdispatch.dylib 0x000000019e361764 _dispatch_lane_invoke + 432
[Mendocino:28151] [14] 0 libdispatch.dylib 0x000000019e360a38 _dispatch_lane_serial_drain + 352
[Mendocino:28151] [15] 0 libdispatch.dylib 0x000000019e361730 _dispatch_lane_invoke + 380
[Mendocino:28151] [16] 0 libdispatch.dylib 0x000000019e36c9a0 _dispatch_root_queue_drain_deferred_wlh + 288
[Mendocino:28151] [17] 0 libdispatch.dylib 0x000000019e36c1ec _dispatch_workloop_worker_thread + 540
[Mendocino:28151] [18] 0 libsystem_pthread.dylib 0x000000019e5083d8 _pthread_wqthread + 288
[Mendocino:28151] [19] 0 libsystem_pthread.dylib 0x000000019e5070f0 start_wqthread + 8
[Mendocino:28151] *** End of error message ***
--------------------------------------------------------------------------
prterun noticed that process rank 2 with PID 28151 on node Mendocino exited on
signal 6 (Abort trap: 6).
The text was updated successfully, but these errors were encountered:
Yes.. I added a fix for that in the most recent mlx-lm but it's not in PyPi. If you build the package from source it should work.
I think 3x128 GB is enough to run it in 3-bit (but not 4-bit). Also I need to upload a bf16 version of the model, the fp16 version doesn't work as well unfortunately. So just a heads up if you see any suspect behavior.
Perhaps this is hopeless but I thought it would be worth asking. I know I am close to the memory limit. Is there still hope to fit the model on 3 M2U with 128GB RAM? The 4th node is being used on another project and was curious if it would fit in 3 nodes.
I have used
sudo sysctl -w iogpu.wired_limit_mb=122000
on all three nodes.The code is running and I can see the memory increasing but then it fails apparently before it completes loading the model because at the time of failure it is still 100% CPU utilization.
The text was updated successfully, but these errors were encountered: