RuntimeError: Lightning can't create new processes if CUDA is already initialized. #231

christina-nasika-edo · 2023-07-13T14:30:21Z

I am getting this error

─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in :10 │
│ │
│ 7 # Initializes the model │
│ 8 model = BaseModel.create("llama_lora_int8") │
│ 9 # Finetuned the model │
│ ❱ 10 model.finetune(dataset=instruction_dataset) │
│ 11 │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/xturing/models/causal.py:113 in finetune │
│ │
│ 110 │ │ │ "instruction_dataset", │
│ 111 │ │ ], "Please make sure the dataset_type is text_dataset or instruction_dataset" │
│ 112 │ │ trainer = self._make_trainer(dataset, logger) │
│ ❱ 113 │ │ trainer.fit() │
│ 114 │ │
│ 115 │ def evaluate(self, dataset: Union[TextDataset, InstructionDataset]): │
│ 116 │ │ pass │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/xturing/trainers/lightning_trainer.py:190 in │
│ fit │
│ │
│ 187 │ │ │ ) │
│ 188 │ │
│ 189 │ def fit(self): │
│ ❱ 190 │ │ self.trainer.fit(self.lightning_model) │
│ 191 │ │ if self.trainer.checkpoint_callback is not None: │
│ 192 │ │ │ self.trainer.checkpoint_callback.best_model_path │
│ 193 │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:529 in │
│ fit │
│ │
│ 526 │ │ """ │
│ 527 │ │ model = _maybe_unwrap_optimized(model) │
│ 528 │ │ self.strategy._lightning_module = model │
│ ❱ 529 │ │ call._call_and_handle_interrupt( │
│ 530 │ │ │ self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, │
│ 531 │ │ ) │
│ 532 │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:41 in │
│ _call_and_handle_interrupt │
│ │
│ 38 │ """ │
│ 39 │ try: │
│ 40 │ │ if trainer.strategy.launcher is not None: │
│ ❱ 41 │ │ │ return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, │
│ 42 │ │ return trainer_fn(*args, **kwargs) │
│ 43 │ │
│ 44 │ except _TunerExitException: │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multipr │
│ ocessing.py:99 in launch │
│ │
│ 96 │ │ """ │
│ 97 │ │ self._check_torchdistx_support() │
│ 98 │ │ if self._start_method in ("fork", "forkserver"): │
│ ❱ 99 │ │ │ _check_bad_cuda_fork() │
│ 100 │ │ │
│ 101 │ │ # The default cluster environment in Lightning chooses a random free port number │
│ 102 │ │ # This needs to be done in the main process here before starting processes to en │
│ │
│ /opt/conda/envs/venv/lib/python3.10/site-packages/lightning_fabric/strategies/launchers/multipro │
│ cessing.py:189 in _check_bad_cuda_fork │
│ │
│ 186 │ ) │
│ 187 │ if _IS_INTERACTIVE: │
│ 188 │ │ message += " You will have to restart the Python kernel." │
│ ❱ 189 │ raise RuntimeError(message) │
│ 190 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call
torch.cuda.* functions, have moved the model to the device, or allocated memory on the GPU any other way? Please
remove any such calls, or change the selected strategy. You will have to restart the Python kernel.

All I did was run the beginning of the lora-llama-int8 tutorial

import gc

from xturing.datasets.instruction_dataset import InstructionDataset
from xturing.models import BaseModel

instruction_dataset = InstructionDataset("./xturing_data")

Initializes the model

model = BaseModel.create("llama_lora_int8")

Finetuned the model

model.finetune(dataset=instruction_dataset)

Do you know what might be the issue?

The text was updated successfully, but these errors were encountered:

tushar2407 · 2023-07-15T07:51:44Z

Can you run the script again by making sure your GPU is empty using the command nvidia-smi?
Also, instead of using the interactive mode, just run the script with the command python llama_lora_int8.py.
Moreover, make sure to update the version of your xturing using the command pip install xturing --upgrade.
Let us know if the error persists.

christina-nasika-edo · 2023-07-18T08:49:27Z

Thank you @tushar2407, I got the script running following your advice.

Is there a way to know how the fine-tuning is progressing?
It has been stuck in a message (Epoch 0: 100%) for like a day.

tushar2407 · 2023-07-25T12:35:45Z

Hey @christi7,
I am glad it works. The functionality is not yet there in the library but you can contribute the same! Here is the contribution guide. You will have to add a class here.

StochasticRomanAgeev closed this as completed Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Lightning can't create new processes if CUDA is already initialized. #231

RuntimeError: Lightning can't create new processes if CUDA is already initialized. #231

christina-nasika-edo commented Jul 13, 2023 •

edited

Loading

tushar2407 commented Jul 15, 2023

christina-nasika-edo commented Jul 18, 2023

tushar2407 commented Jul 25, 2023 •

edited

Loading

RuntimeError: Lightning can't create new processes if CUDA is already initialized. #231

RuntimeError: Lightning can't create new processes if CUDA is already initialized. #231

Comments

christina-nasika-edo commented Jul 13, 2023 • edited Loading

Initializes the model

Finetuned the model

tushar2407 commented Jul 15, 2023

christina-nasika-edo commented Jul 18, 2023

tushar2407 commented Jul 25, 2023 • edited Loading

christina-nasika-edo commented Jul 13, 2023 •

edited

Loading

tushar2407 commented Jul 25, 2023 •

edited

Loading