You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in :10 │
│ │
│ 7 # Initializes the model │
│ 8 model = BaseModel.create("llama_lora_int8") │
│ 9 # Finetuned the model │
│ ❱ 10 model.finetune(dataset=instruction_dataset) │
│ 11 │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/xturing/models/causal.py:113 in finetune │
│ │
│ 110 │ │ │ "instruction_dataset", │
│ 111 │ │ ], "Please make sure the dataset_type is text_dataset or instruction_dataset" │
│ 112 │ │ trainer = self._make_trainer(dataset, logger) │
│ ❱ 113 │ │ trainer.fit() │
│ 114 │ │
│ 115 │ def evaluate(self, dataset: Union[TextDataset, InstructionDataset]): │
│ 116 │ │ pass │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/xturing/trainers/lightning_trainer.py:190 in │
│ fit │
│ │
│ 187 │ │ │ ) │
│ 188 │ │
│ 189 │ def fit(self): │
│ ❱ 190 │ │ self.trainer.fit(self.lightning_model) │
│ 191 │ │ if self.trainer.checkpoint_callback is not None: │
│ 192 │ │ │ self.trainer.checkpoint_callback.best_model_path │
│ 193 │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:529 in │
│ fit │
│ │
│ 526 │ │ """ │
│ 527 │ │ model = _maybe_unwrap_optimized(model) │
│ 528 │ │ self.strategy._lightning_module = model │
│ ❱ 529 │ │ call._call_and_handle_interrupt( │
│ 530 │ │ │ self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, │
│ 531 │ │ ) │
│ 532 │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:41 in │
│ _call_and_handle_interrupt │
│ │
│ 38 │ """ │
│ 39 │ try: │
│ 40 │ │ if trainer.strategy.launcher is not None: │
│ ❱ 41 │ │ │ return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, │
│ 42 │ │ return trainer_fn(*args, **kwargs) │
│ 43 │ │
│ 44 │ except _TunerExitException: │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multipr │
│ ocessing.py:99 in launch │
│ │
│ 96 │ │ """ │
│ 97 │ │ self._check_torchdistx_support() │
│ 98 │ │ if self._start_method in ("fork", "forkserver"): │
│ ❱ 99 │ │ │ _check_bad_cuda_fork() │
│ 100 │ │ │
│ 101 │ │ # The default cluster environment in Lightning chooses a random free port number │
│ 102 │ │ # This needs to be done in the main process here before starting processes to en │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/lightning_fabric/strategies/launchers/multipro │
│ cessing.py:189 in _check_bad_cuda_fork │
│ │
│ 186 │ ) │
│ 187 │ if _IS_INTERACTIVE: │
│ 188 │ │ message += " You will have to restart the Python kernel." │
│ ❱ 189 │ raise RuntimeError(message) │
│ 190 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call torch.cuda.* functions, have moved the model to the device, or allocated memory on the GPU any other way? Please
remove any such calls, or change the selected strategy. You will have to restart the Python kernel.
All I did was run the beginning of the lora-llama-int8 tutorial
import gc
from xturing.datasets.instruction_dataset import InstructionDataset
from xturing.models import BaseModel
Can you run the script again by making sure your GPU is empty using the command nvidia-smi?
Also, instead of using the interactive mode, just run the script with the command python llama_lora_int8.py.
Moreover, make sure to update the version of your xturing using the command pip install xturing --upgrade.
Let us know if the error persists.
Hey @christi7,
I am glad it works. The functionality is not yet there in the library but you can contribute the same! Here is the contribution guide. You will have to add a class here.
I am getting this error
RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call
torch.cuda.*
functions, have moved the model to the device, or allocated memory on the GPU any other way? Pleaseremove any such calls, or change the selected strategy. You will have to restart the Python kernel.
All I did was run the beginning of the lora-llama-int8 tutorial
import gc
from xturing.datasets.instruction_dataset import InstructionDataset
from xturing.models import BaseModel
instruction_dataset = InstructionDataset("./xturing_data")
Initializes the model
model = BaseModel.create("llama_lora_int8")
Finetuned the model
model.finetune(dataset=instruction_dataset)
Do you know what might be the issue?
The text was updated successfully, but these errors were encountered: