CPU Only Support #222

chriskuchar · 2023-06-20T16:58:39Z

Hello,
I posted a previous issue for this #193. I couldn't get this solution to work however. Is there a guide you guys have for CPU only?

StochasticRomanAgeev · 2023-06-21T23:41:38Z

Hi @chriskuchar,
Can you please share issue you are getting with 32 value in configuration?

chriskuchar · 2023-07-10T19:54:38Z

Hello @StochasticRomanAgeev ,
This is the code I am running, and this is the error I am getting. I think it's some configuration with letting the cpu_adam run, but I don't have the knowledge of the back-end code for how to fix it. It breaks when I try to run the model.finetune.

I am coming across two different errors. The first one is "Error building extension 'cpu_adam'" if I try to run

`# Load the dataset
instruction_dataset = InstructionDataset("alpaca_data")

Initialize the model

model = BaseModel.create("llama")

Finetune the model

model.finetune(dataset=instruction_dataset)`

The second error is documented below with more extensive code.

Also,
I changed the lora_alpha to 32 via your previous suggestion.

lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=target_modules,
lora_dropout=0.05,
bias="none",
inference_mode=False,
base_model_name_or_path=self.base_model.dict.get("name_or_path", None),
)

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel


import json

from datasets import Dataset, DatasetDict

# Convert the alpaca JSON dataset to HF format


# Right now only the HuggingFace datasets are supported, that's why the JSON Alpaca dataset
# needs to be converted to the HuggingFace format. In addition, this HF dataset should have 3 columns for instruction finetuning: instruction, text and target.
def preprocess_alpaca_json_data(alpaca_dataset_path: str):
    """Creates a dataset given the alpaca JSON dataset. You can download it here: https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
    :param alpaca_dataset_path: path of the Alpaca dataset
    """
    alpaca_data = json.load(open(alpaca_dataset_path))
    instructions = []
    inputs = []
    outputs = []

    for data in alpaca_data:
        instructions.append(data["instruction"])
        inputs.append(data["input"])
        outputs.append(data["output"])

    data_dict = {
        "train": {"instruction": instructions, "text": inputs, "target": outputs}
    }

    dataset = DatasetDict()
    # using your `Dict` object
    for k, v in data_dict.items():
        dataset[k] = Dataset.from_dict(v)

    dataset.save_to_disk(str("./alpaca_data"))


preprocess_alpaca_json_data('alpaca_data.json')
# Load the dataset
instruction_dataset = InstructionDataset("alpaca_data")

# Initialize the model
model = BaseModel.create("llama_lora")
# Finetune the model
model.finetune(dataset=instruction_dataset)

# Perform inference
output = model.generate(texts=["Why LLM models are becoming so important?"])

print("Generated output by the model: {}".format(output))

`trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: /Users/christopher.beckett/datascience/Chat_Suggestions/lightning_logs

Finding best initial lr: 0%| | 0/100 [00:00<?, ?it/s]
╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ :1 in │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/models/causa │
│ l.py:88 in finetune │
│ │
│ 85 │ │ │ "instruction_dataset", │
│ 86 │ │ ], "Please make sure the dataset_type is text_dataset or instruction_datase │
│ 87 │ │ trainer = self._make_trainer(dataset, logger) │
│ ❱ 88 │ │ trainer.fit() │
│ 89 │ │
│ 90 │ def evaluate(self, dataset: Union[TextDataset, InstructionDataset]): │
│ 91 │ │ pass │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/trainers/lig │
│ htning_trainer.py:190 in fit │
│ │
│ 187 │ │ │ ) │
│ 188 │ │
│ 189 │ def fit(self): │
│ ❱ 190 │ │ self.trainer.fit(self.lightning_model) │
│ 191 │ │ if self.trainer.checkpoint_callback is not None: │
│ 192 │ │ │ self.trainer.checkpoint_callback.best_model_path │
│ 193 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/trainer.py:520 in fit │
│ │
│ 517 │ │ """ │
│ 518 │ │ model = _maybe_unwrap_optimized(model) │
│ 519 │ │ self.strategy._lightning_module = model │
│ ❱ 520 │ │ call._call_and_handle_interrupt( │
│ 521 │ │ │ self, self._fit_impl, model, train_dataloaders, val_dataloaders, datam │
│ 522 │ │ ) │
│ 523 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:44 in _call_and_handle_interrupt │
│ │
│ 41 │ │ if trainer.strategy.launcher is not None: │
│ 42 │ │ │ return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trai │
│ 43 │ │ else: │
│ ❱ 44 │ │ │ return trainer_fn(*args, **kwargs) │
│ 45 │ │
│ 46 │ except _TunerExitException: │
│ 47 │ │ _call_teardown_hook(trainer) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/trainer.py:559 in _fit_impl │
│ │
│ 556 │ │ │ model_provided=True, │
│ 557 │ │ │ model_connected=self.lightning_module is not None, │
│ 558 │ │ ) │
│ ❱ 559 │ │ self._run(model, ckpt_path=ckpt_path) │
│ 560 │ │ │
│ 561 │ │ assert self.state.stopped │
│ 562 │ │ self.training = False │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/trainer.py:915 in _run │
│ │
│ 912 │ │ │
│ 913 │ │ # hook │
│ 914 │ │ if self.state.fn == TrainerFn.FITTING: │
│ ❱ 915 │ │ │ call._call_callback_hooks(self, "on_fit_start") │
│ 916 │ │ │ call._call_lightning_module_hook(self, "on_fit_start") │
│ 917 │ │ │
│ 918 │ │ _log_hyperparams(self) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:190 in _call_callback_hooks │
│ │
│ 187 │ │ fn = getattr(callback, hook_name) │
│ 188 │ │ if callable(fn): │
│ 189 │ │ │ with trainer.profiler.profile(f"[Callback]{callback.state_key}.{hook_na │
│ ❱ 190 │ │ │ │ fn(trainer, trainer.lightning_module, *args, **kwargs) │
│ 191 │ │
│ 192 │ if pl_module: │
│ 193 │ │ # restore current_fx when nested context │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/ca │
│ llbacks/lr_finder.py:125 in on_fit_start │
│ │
│ 122 │ │ │ raise _TunerExitException() │
│ 123 │ │
│ 124 │ def on_fit_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") │
│ ❱ 125 │ │ self.lr_find(trainer, pl_module) │
│ 126 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/ca │
│ llbacks/lr_finder.py:109 in lr_find │
│ │
│ 106 │ │
│ 107 │ def lr_find(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> No │
│ 108 │ │ with isolate_rng(): │
│ ❱ 109 │ │ │ self.optimal_lr = _lr_find( │
│ 110 │ │ │ │ trainer, │
│ 111 │ │ │ │ pl_module, │
│ 112 │ │ │ │ min_lr=self._min_lr, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tu │
│ ner/lr_finder.py:269 in _lr_find │
│ │
│ 266 │ lr_finder._exchange_scheduler(trainer) │
│ 267 │ │
│ 268 │ # Fit, lr & loss logged in callback │
│ ❱ 269 │ _try_loop_run(trainer, params) │
│ 270 │ │
│ 271 │ # Prompt if we stopped early │
│ 272 │ if trainer.global_step != num_training + start_steps: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tu │
│ ner/lr_finder.py:495 in _try_loop_run │
│ │
│ 492 │ loop = trainer.fit_loop │
│ 493 │ loop.load_state_dict(deepcopy(params["loop_state_dict"])) │
│ 494 │ loop.restarting = False │
│ ❱ 495 │ loop.run() │
│ 496 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/fit_loop.py:201 in run │
│ │
│ 198 │ │ while not self.done: │
│ 199 │ │ │ try: │
│ 200 │ │ │ │ self.on_advance_start() │
│ ❱ 201 │ │ │ │ self.advance() │
│ 202 │ │ │ │ self.on_advance_end() │
│ 203 │ │ │ │ self._restarting = False │
│ 204 │ │ │ except StopIteration: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/fit_loop.py:354 in advance │
│ │
│ 351 │ │ assert self._data_fetcher is not None │
│ 352 │ │ self._data_fetcher.setup(combined_loader) │
│ 353 │ │ with self.trainer.profiler.profile("run_training_epoch"): │
│ ❱ 354 │ │ │ self.epoch_loop.run(self._data_fetcher) │
│ 355 │ │
│ 356 │ def on_advance_end(self) -> None: │
│ 357 │ │ trainer = self.trainer │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/training_epoch_loop.py:133 in run │
│ │
│ 130 │ │ self.on_run_start(data_fetcher) │
│ 131 │ │ while not self.done: │
│ 132 │ │ │ try: │
│ ❱ 133 │ │ │ │ self.advance(data_fetcher) │
│ 134 │ │ │ │ self.on_advance_end() │
│ 135 │ │ │ │ self._restarting = False │
│ 136 │ │ │ except StopIteration: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/training_epoch_loop.py:218 in advance │
│ │
│ 215 │ │ │ with trainer.profiler.profile("run_training_batch"): │
│ 216 │ │ │ │ if trainer.lightning_module.automatic_optimization: │
│ 217 │ │ │ │ │ # in automatic optimization, there can only be one optimizer │
│ ❱ 218 │ │ │ │ │ batch_output = self.automatic_optimization.run(trainer.optimize │
│ 219 │ │ │ │ else: │
│ 220 │ │ │ │ │ batch_output = self.manual_optimization.run(kwargs) │
│ 221 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:185 in run │
│ │
│ 182 │ │ # ------------------------------ │
│ 183 │ │ # gradient update with accumulated gradients │
│ 184 │ │ else: │
│ ❱ 185 │ │ │ self._optimizer_step(kwargs.get("batch_idx", 0), closure) │
│ 186 │ │ │
│ 187 │ │ result = closure.consume_result() │
│ 188 │ │ if result.loss is None: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:261 in _optimizer_step │
│ │
│ 258 │ │ │ self.optim_progress.optimizer.step.increment_ready() │
│ 259 │ │ │
│ 260 │ │ # model hook │
│ ❱ 261 │ │ call._call_lightning_module_hook( │
│ 262 │ │ │ trainer, │
│ 263 │ │ │ "optimizer_step", │
│ 264 │ │ │ trainer.current_epoch, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:142 in _call_lightning_module_hook │
│ │
│ 139 │ pl_module._current_fx_name = hook_name │
│ 140 │ │
│ 141 │ with trainer.profiler.profile(f"[LightningModule]{pl_module.class.name} │
│ ❱ 142 │ │ output = fn(*args, **kwargs) │
│ 143 │ │
│ 144 │ # restore current_fx when nested context │
│ 145 │ pl_module._current_fx_name = prev_fx_name │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/co │
│ re/module.py:1265 in optimizer_step │
│ │
│ 1262 │ │ │ │ │ for pg in optimizer.param_groups: │
│ 1263 │ │ │ │ │ │ pg["lr"] = lr_scale * self.learning_rate │
│ 1264 │ │ """ │
│ ❱ 1265 │ │ optimizer.step(closure=optimizer_closure) │
│ 1266 │ │
│ 1267 │ def optimizer_zero_grad(self, epoch: int, batch_idx: int, optimizer: Optimizer │
│ 1268 │ │ """Override this method to change the default behaviour of optimizer.zer │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/co │ │ re/optimizer.py:158 in step │ │ │ │ 155 │ │ │ raise MisconfigurationException("When `optimizer.step(closure)` is call │ │ 156 │ │ │ │ 157 │ │ assert self._strategy is not None │ │ ❱ 158 │ │ step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwa │ │ 159 │ │ │ │ 160 │ │ self._on_after_step() │ │ 161 │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/st │ │ rategies/strategy.py:224 in optimizer_step │ │ │ │ 221 │ │ model = model or self.lightning_module │ │ 222 │ │ # TODO(fabric): remove assertion once strategy's optimizer_step typing is f │ │ 223 │ │ assert isinstance(model, pl.LightningModule) │ │ ❱ 224 │ │ return self.precision_plugin.optimizer_step(optimizer, model=model, closure │ │ 225 │ │ │ 226 │ def _setup_model_and_optimizers(self, model: Module, optimizers: List[Optimizer │ │ 227 │ │ """Setup a model and multiple optimizers together. │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/pl │ │ ugins/precision/precision_plugin.py:114 in optimizer_step │ │ │ │ 111 │ ) -> Any: │ │ 112 │ │ """Hook to run the optimizer step.""" │ │ 113 │ │ closure = partial(self._wrap_closure, model, optimizer, closure) │ │ ❱ 114 │ │ return optimizer.step(closure=closure, **kwargs) │ │ 115 │ │ │ 116 │ def _clip_gradients( │ │ 117 │ │ self, │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/lr_sched │ │ uler.py:69 in wrapper │ │ │ │ 66 │ │ │ │ instance = instance_ref() │ │ 67 │ │ │ │ instance._step_count += 1 │ │ 68 │ │ │ │ wrapped = func.__get__(instance, cls) │ │ ❱ 69 │ │ │ │ return wrapped(*args, **kwargs) │ │ 70 │ │ │ │ │ 71 │ │ │ # Note that the returned function here is no longer a bound method, │ │ 72 │ │ │ # so attributes like `__func__` and `__self__` no longer exist. │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/optimize │ │ r.py:280 in wrapper │ │ │ │ 277 │ │ │ │ │ │ │ raise RuntimeError(f"{func} must return None or a tuple │ │ 278 │ │ │ │ │ │ │ │ │ │ │ f"but got {result}.") │ │ 279 │ │ │ │ │ │ ❱ 280 │ │ │ │ out = func(*args, **kwargs) │ │ 281 │ │ │ │ self._optimizer_step_code() │ │ 282 │ │ │ │ │ │ 283 │ │ │ │ # call optimizer step post hooks │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/optimize │ │ r.py:33 in _use_grad │ │ │ │ 30 │ │ prev_grad = torch.is_grad_enabled() │ │ 31 │ │ try: │ │ 32 │ │ │ torch.set_grad_enabled(self.defaults['differentiable']) │ │ ❱ 33 │ │ │ ret = func(self, *args, **kwargs) │ │ 34 │ │ finally: │ │ 35 │ │ │ torch.set_grad_enabled(prev_grad) │ │ 36 │ │ return ret │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/adamw.py │ │ :148 in step │ │ │ │ 145 │ │ loss = None │ │ 146 │ │ if closure is not None: │ │ 147 │ │ │ with torch.enable_grad(): │ │ ❱ 148 │ │ │ │ loss = closure() │ │ 149 │ │ │ │ 150 │ │ for group in self.param_groups: │ │ 151 │ │ │ params_with_grad = [] │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/pl │ │ ugins/precision/precision_plugin.py:101 in _wrap_closure │ │ │ │ 98 │ │ The closure (generally) runs backwardso this allows inspecting gradien │ │ 99 │ │ consistent with thePrecisionPluginsubclasses that cannot passoptim │
│ 100 │ │ """ │
│ ❱ 101 │ │ closure_result = closure() │
│ 102 │ │ self._after_closure(model, optimizer) │
│ 103 │ │ return closure_result │
│ 104 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:140 in call │
│ │
│ 137 │ │ return step_output │
│ 138 │ │
│ 139 │ def call(self, *args: Any, **kwargs: Any) -> Optional[Tensor]: │
│ ❱ 140 │ │ self._result = self.closure(*args, **kwargs) │
│ 141 │ │ return self._result.loss │
│ 142 │
│ 143 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:126 in closure │
│ │
│ 123 │ │ self._zero_grad_fn = zero_grad_fn │
│ 124 │ │
│ 125 │ def closure(self, *args: Any, **kwargs: Any) -> ClosureResult: │
│ ❱ 126 │ │ step_output = self._step_fn() │
│ 127 │ │ │
│ 128 │ │ if step_output.closure_loss is None: │
│ 129 │ │ │ self.warning_cache.warn("training_step returned None. If this was o │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:308 in _training_step │
│ │
│ 305 │ │ trainer = self.trainer │
│ 306 │ │ │
│ 307 │ │ # manually capture logged metrics │
│ ❱ 308 │ │ training_step_output = call._call_strategy_hook(trainer, "training_step", * │
│ 309 │ │ self.trainer.strategy.post_training_step() │
│ 310 │ │ │
│ 311 │ │ result = self.output_result_cls.from_training_step_output(training_step_out │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:288 in _call_strategy_hook │
│ │
│ 285 │ │ return │
│ 286 │ │
│ 287 │ with trainer.profiler.profile(f"[Strategy]{trainer.strategy.class.name} │
│ ❱ 288 │ │ output = fn(*args, **kwargs) │
│ 289 │ │
│ 290 │ # restore current_fx when nested context │
│ 291 │ pl_module._current_fx_name = prev_fx_name │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/st │
│ rategies/strategy.py:366 in training_step │
│ │
│ 363 │ │ """ │
│ 364 │ │ with self.precision_plugin.train_step_context(): │
│ 365 │ │ │ assert isinstance(self.model, TrainingStep) │
│ ❱ 366 │ │ │ return self.model.training_step(*args, **kwargs) │
│ 367 │ │
│ 368 │ def post_training_step(self) -> None: │
│ 369 │ │ pass │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/trainers/lig │
│ htning_trainer.py:76 in training_step │
│ │
│ 73 │ │ return self.train_dl │
│ 74 │ │
│ 75 │ def training_step(self, batch, batch_idx): │
│ ❱ 76 │ │ loss = self.model_engine.training_step(batch) │
│ 77 │ │ self.losses.append(loss.item()) │
│ 78 │ │ self.log("loss", loss.item(), prog_bar=True) │
│ 79 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/caus │
│ al.py:89 in training_step │
│ │
│ 86 │ │ │ │ │ attention_mask=batch.get("attention_mask", None), │
│ 87 │ │ │ │ ) │
│ 88 │ │ else: │
│ ❱ 89 │ │ │ outputs = self.model( │
│ 90 │ │ │ │ input_ids=batch["input_ids"], │
│ 91 │ │ │ │ attention_mask=batch.get("attention_mask", None), │
│ 92 │ │ │ ) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:1059 in forward │
│ │
│ 1056 │ │ ) │
│ 1057 │ │ │
│ 1058 │ │ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec │
│ ❱ 1059 │ │ outputs = self.model( │
│ 1060 │ │ │ input_ids=input_ids, │
│ 1061 │ │ │ attention_mask=attention_mask, │
│ 1062 │ │ │ past_key_values=past_key_values, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:897 in forward │
│ │
│ 894 │ │ │ │ │ None, │
│ 895 │ │ │ │ ) │
│ 896 │ │ │ else: │
│ ❱ 897 │ │ │ │ layer_outputs = decoder_layer( │
│ 898 │ │ │ │ │ hidden_states, │
│ 899 │ │ │ │ │ attention_mask=attention_mask, │
│ 900 │ │ │ │ │ past_key_value=past_key_value, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:651 in forward │
│ │
│ 648 │ │ hidden_states = self.input_layernorm(hidden_states) │
│ 649 │ │ │
│ 650 │ │ # Self Attention │
│ ❱ 651 │ │ hidden_states, self_attn_weights, present_key_value = self.self_attn( │
│ 652 │ │ │ hidden_states=hidden_states, │
│ 653 │ │ │ past_key_value=past_key_value, │
│ 654 │ │ │ attention_mask=attention_mask, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:528 in forward │
│ │
│ 525 │ │ bsz, q_len, _ = hidden_states.size() │
│ 526 │ │ │
│ 527 │ │ query_states = ( │
│ ❱ 528 │ │ │ self.q_proj(hidden_states) │
│ 529 │ │ │ .view(bsz, q_len, self.num_heads, self.head_dim) │
│ 530 │ │ │ .transpose(1, 2) │
│ 531 │ │ ) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward │
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/lora │
│ engine/lora.py:570 in forward │
│ │
│ 567 │ │ │ │
│ 568 │ │ │ return F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=s │
│ 569 │ │ elif self.r > 0 and not self.merged: │
│ ❱ 570 │ │ │ result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias │
│ 571 │ │ │ if self.r > 0: │
│ 572 │ │ │ │ loraoutput = self.lora_B(self.lora_A(self.lora_dropout(x))) * self │
│ 573 │ │ │ │ result = result + loraoutput │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: "addmm_impl_cpu" not implemented for 'Half'
`

StochasticRomanAgeev · 2023-07-12T11:36:47Z

Hi @chriskuchar,
We released version with int4 support for many different models, now you can tune models even on small gpu.
If you still want cpu-only, try please just llama model type, we are working on testing llama_lora in cpu-only mode.

StochasticRomanAgeev closed this as completed Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU Only Support #222

CPU Only Support #222

chriskuchar commented Jun 20, 2023

StochasticRomanAgeev commented Jun 21, 2023

chriskuchar commented Jul 10, 2023 •

edited

Loading

StochasticRomanAgeev commented Jul 12, 2023 •

edited

Loading

CPU Only Support #222

CPU Only Support #222

Comments

chriskuchar commented Jun 20, 2023

StochasticRomanAgeev commented Jun 21, 2023

chriskuchar commented Jul 10, 2023 • edited Loading

Initialize the model

Finetune the model

StochasticRomanAgeev commented Jul 12, 2023 • edited Loading

chriskuchar commented Jul 10, 2023 •

edited

Loading

StochasticRomanAgeev commented Jul 12, 2023 •

edited

Loading