RuntimeError: DataLoader worker (pid(s) 69269) exited unexpectedly #232

devprofession · 2023-07-17T11:33:16Z

I've tried to fine tune the Llama model in google Colab pro on A100:
model = BaseModel.create("llama_lora_int8")

In the 2 epoch it stopped and the following error appeared:

Loading checkpoint shards: 100%
33/33 [01:35<00:00, 3.03s/it]

INFO:pytorch_lightning.utilities.rank_zero:Using 16bit Automatic Mixed Precision (AMP)
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:IPU available: False, using: 0 IPUs
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
| Name | Type | Params

0 | pytorch_model | LoraModel | 6.7 B

4.2 M Trainable params
6.7 B Non-trainable params
6.7 B Total params
26,970.440Total estimated model params size (MB)

trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199

Epoch 0: 0%
0/3515 [00:00<?, ?it/s]

Empty Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
1131 try:
-> 1132 data = self._data_queue.get(timeout=timeout)
1133 return (True, data)

20 frames

Empty:

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)

/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
1143 if len(failed_workers) > 0:
1144 pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1145 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
1146 if isinstance(e, queue.Empty):
1147 return (False, None)

RuntimeError: DataLoader worker (pid(s) 69269) exited unexpectedly

devprofession · 2023-07-19T09:28:36Z

Google Colab Pro gives you only 16GB GPU memory, you should upgrade to Pro+-

devprofession closed this as completed Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: DataLoader worker (pid(s) 69269) exited unexpectedly #232

RuntimeError: DataLoader worker (pid(s) 69269) exited unexpectedly #232

devprofession commented Jul 17, 2023 •

edited

Loading

devprofession commented Jul 19, 2023

RuntimeError: DataLoader worker (pid(s) 69269) exited unexpectedly #232

RuntimeError: DataLoader worker (pid(s) 69269) exited unexpectedly #232

Comments

devprofession commented Jul 17, 2023 • edited Loading

0 | pytorch_model | LoraModel | 6.7 B

devprofession commented Jul 19, 2023

devprofession commented Jul 17, 2023 •

edited

Loading