📉 Optimize GRPO memory usage by redefining per_device_batch_size
as generations per device
#2776
+107
−68
per_device_batch_size
as generations per device
#2776