You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem:
When I run a finetuning on llama2 with a custom dataset, I get the following error: AssertionError: CPUAdam param is on cuda:0 and must be 'cpu', make sure you enabled 'offload_optimizer': 'cpu' in your ZeRO config.
Code:
from xturing.datasets.instruction_dataset import InstructionDataset
from xturing.models import BaseModel
instruction_dataset = InstructionDataset("./train")
# Initializes the model
model = BaseModel.create("llama2")
# Finetuned the model
model.finetune(dataset=instruction_dataset)
# Once the model has been finetuned, you can start doing inferences
output = model.generate(texts=["Why LLM models are becoming so important?"])
print("Generated output by the model: {}".format(output))
# Save the model
model.save("./llama_weights")
Logs:
Traceback (most recent call last):
File "/workspace/xturingexample/llama2_train.py", line 9, in <module>
model.finetune(dataset=instruction_dataset)
File "/usr/local/lib/python3.10/dist-packages/xturing/models/causal.py", line 119, in finetune
trainer.fit()
File "/usr/local/lib/python3.10/dist-packages/xturing/trainers/lightning_trainer.py", line 187, in fit
self.trainer.fit(self.lightning_model)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 545, in fit
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 581, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 990, in _run
results = self._run_stage()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1036, in _run_stage
self.fit_loop.run()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
self.advance()
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
self.epoch_loop.run(self._data_fetcher)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run
self.advance(data_fetcher)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/training_epoch_loop.py", line 240, in advance
batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 187, in run
self._optimizer_step(batch_idx, closure)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/optimization/automatic.py", line 265, in _optimizer_step
call._call_lightning_module_hook(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/module.py", line 1282, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 151, in step
step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 230, in optimizer_step
return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 117, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/ops/adam/cpu_adam.py", line 150, in step
assert p.device == device, f"CPUAdam param is on {p.device} and must be 'cpu', make " \
AssertionError: CPUAdam param is on cuda:0 and must be 'cpu', make sure you enabled 'offload_optimizer': 'cpu' in your ZeRO config.
The text was updated successfully, but these errors were encountered:
Problem:
When I run a finetuning on llama2 with a custom dataset, I get the following error:
AssertionError: CPUAdam param is on cuda:0 and must be 'cpu', make sure you enabled 'offload_optimizer': 'cpu' in your ZeRO config.
Code:
Logs:
The text was updated successfully, but these errors were encountered: