Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Only Support #222

Closed
chriskuchar opened this issue Jun 20, 2023 · 3 comments
Closed

CPU Only Support #222

chriskuchar opened this issue Jun 20, 2023 · 3 comments

Comments

@chriskuchar
Copy link

Hello,
I posted a previous issue for this #193. I couldn't get this solution to work however. Is there a guide you guys have for CPU only?

@StochasticRomanAgeev
Copy link
Contributor

Hi @chriskuchar,
Can you please share issue you are getting with 32 value in configuration?

@chriskuchar
Copy link
Author

chriskuchar commented Jul 10, 2023

Hello @StochasticRomanAgeev ,
This is the code I am running, and this is the error I am getting. I think it's some configuration with letting the cpu_adam run, but I don't have the knowledge of the back-end code for how to fix it. It breaks when I try to run the model.finetune.

I am coming across two different errors. The first one is "Error building extension 'cpu_adam'" if I try to run

`# Load the dataset
instruction_dataset = InstructionDataset("alpaca_data")

Initialize the model

model = BaseModel.create("llama")

Finetune the model

model.finetune(dataset=instruction_dataset)`

The second error is documented below with more extensive code.

Also,
I changed the lora_alpha to 32 via your previous suggestion.

lora_config = LoraConfig(
r=8,
lora_alpha=32,
target_modules=target_modules,
lora_dropout=0.05,
bias="none",
inference_mode=False,
base_model_name_or_path=self.base_model.dict.get("name_or_path", None),
)

from xturing.datasets import InstructionDataset
from xturing.models import BaseModel


import json

from datasets import Dataset, DatasetDict

# Convert the alpaca JSON dataset to HF format


# Right now only the HuggingFace datasets are supported, that's why the JSON Alpaca dataset
# needs to be converted to the HuggingFace format. In addition, this HF dataset should have 3 columns for instruction finetuning: instruction, text and target.
def preprocess_alpaca_json_data(alpaca_dataset_path: str):
    """Creates a dataset given the alpaca JSON dataset. You can download it here: https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
    :param alpaca_dataset_path: path of the Alpaca dataset
    """
    alpaca_data = json.load(open(alpaca_dataset_path))
    instructions = []
    inputs = []
    outputs = []

    for data in alpaca_data:
        instructions.append(data["instruction"])
        inputs.append(data["input"])
        outputs.append(data["output"])

    data_dict = {
        "train": {"instruction": instructions, "text": inputs, "target": outputs}
    }

    dataset = DatasetDict()
    # using your `Dict` object
    for k, v in data_dict.items():
        dataset[k] = Dataset.from_dict(v)

    dataset.save_to_disk(str("./alpaca_data"))


preprocess_alpaca_json_data('alpaca_data.json')
# Load the dataset
instruction_dataset = InstructionDataset("alpaca_data")

# Initialize the model
model = BaseModel.create("llama_lora")
# Finetune the model
model.finetune(dataset=instruction_dataset)

# Perform inference
output = model.generate(texts=["Why LLM models are becoming so important?"])

print("Generated output by the model: {}".format(output))

`trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Missing logger folder: /Users/christopher.beckett/datascience/Chat_Suggestions/lightning_logs

Finding best initial lr: 0%| | 0/100 [00:00<?, ?it/s]
╭──────────────────────────── Traceback (most recent call last) ────────────────────────────╮
│ :1 in │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/models/causa │
│ l.py:88 in finetune │
│ │
│ 85 │ │ │ "instruction_dataset", │
│ 86 │ │ ], "Please make sure the dataset_type is text_dataset or instruction_datase │
│ 87 │ │ trainer = self._make_trainer(dataset, logger) │
│ ❱ 88 │ │ trainer.fit() │
│ 89 │ │
│ 90 │ def evaluate(self, dataset: Union[TextDataset, InstructionDataset]): │
│ 91 │ │ pass │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/trainers/lig │
│ htning_trainer.py:190 in fit │
│ │
│ 187 │ │ │ ) │
│ 188 │ │
│ 189 │ def fit(self): │
│ ❱ 190 │ │ self.trainer.fit(self.lightning_model) │
│ 191 │ │ if self.trainer.checkpoint_callback is not None: │
│ 192 │ │ │ self.trainer.checkpoint_callback.best_model_path │
│ 193 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/trainer.py:520 in fit │
│ │
│ 517 │ │ """ │
│ 518 │ │ model = _maybe_unwrap_optimized(model) │
│ 519 │ │ self.strategy._lightning_module = model │
│ ❱ 520 │ │ call._call_and_handle_interrupt( │
│ 521 │ │ │ self, self._fit_impl, model, train_dataloaders, val_dataloaders, datam │
│ 522 │ │ ) │
│ 523 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:44 in _call_and_handle_interrupt │
│ │
│ 41 │ │ if trainer.strategy.launcher is not None: │
│ 42 │ │ │ return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trai │
│ 43 │ │ else: │
│ ❱ 44 │ │ │ return trainer_fn(*args, **kwargs) │
│ 45 │ │
│ 46 │ except _TunerExitException: │
│ 47 │ │ _call_teardown_hook(trainer) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/trainer.py:559 in _fit_impl │
│ │
│ 556 │ │ │ model_provided=True, │
│ 557 │ │ │ model_connected=self.lightning_module is not None, │
│ 558 │ │ ) │
│ ❱ 559 │ │ self._run(model, ckpt_path=ckpt_path) │
│ 560 │ │ │
│ 561 │ │ assert self.state.stopped │
│ 562 │ │ self.training = False │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/trainer.py:915 in _run │
│ │
│ 912 │ │ │
│ 913 │ │ # hook │
│ 914 │ │ if self.state.fn == TrainerFn.FITTING: │
│ ❱ 915 │ │ │ call._call_callback_hooks(self, "on_fit_start") │
│ 916 │ │ │ call._call_lightning_module_hook(self, "on_fit_start") │
│ 917 │ │ │
│ 918 │ │ _log_hyperparams(self) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:190 in _call_callback_hooks │
│ │
│ 187 │ │ fn = getattr(callback, hook_name) │
│ 188 │ │ if callable(fn): │
│ 189 │ │ │ with trainer.profiler.profile(f"[Callback]{callback.state_key}.{hook_na │
│ ❱ 190 │ │ │ │ fn(trainer, trainer.lightning_module, *args, **kwargs) │
│ 191 │ │
│ 192 │ if pl_module: │
│ 193 │ │ # restore current_fx when nested context │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/ca │
│ llbacks/lr_finder.py:125 in on_fit_start │
│ │
│ 122 │ │ │ raise _TunerExitException() │
│ 123 │ │
│ 124 │ def on_fit_start(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") │
│ ❱ 125 │ │ self.lr_find(trainer, pl_module) │
│ 126 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/ca │
│ llbacks/lr_finder.py:109 in lr_find │
│ │
│ 106 │ │
│ 107 │ def lr_find(self, trainer: "pl.Trainer", pl_module: "pl.LightningModule") -> No │
│ 108 │ │ with isolate_rng(): │
│ ❱ 109 │ │ │ self.optimal_lr = _lr_find( │
│ 110 │ │ │ │ trainer, │
│ 111 │ │ │ │ pl_module, │
│ 112 │ │ │ │ min_lr=self._min_lr, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tu │
│ ner/lr_finder.py:269 in _lr_find │
│ │
│ 266 │ lr_finder._exchange_scheduler(trainer) │
│ 267 │ │
│ 268 │ # Fit, lr & loss logged in callback │
│ ❱ 269 │ _try_loop_run(trainer, params) │
│ 270 │ │
│ 271 │ # Prompt if we stopped early │
│ 272 │ if trainer.global_step != num_training + start_steps: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tu │
│ ner/lr_finder.py:495 in _try_loop_run │
│ │
│ 492 │ loop = trainer.fit_loop │
│ 493 │ loop.load_state_dict(deepcopy(params["loop_state_dict"])) │
│ 494 │ loop.restarting = False │
│ ❱ 495 │ loop.run() │
│ 496 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/fit_loop.py:201 in run │
│ │
│ 198 │ │ while not self.done: │
│ 199 │ │ │ try: │
│ 200 │ │ │ │ self.on_advance_start() │
│ ❱ 201 │ │ │ │ self.advance() │
│ 202 │ │ │ │ self.on_advance_end() │
│ 203 │ │ │ │ self._restarting = False │
│ 204 │ │ │ except StopIteration: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/fit_loop.py:354 in advance │
│ │
│ 351 │ │ assert self._data_fetcher is not None │
│ 352 │ │ self._data_fetcher.setup(combined_loader) │
│ 353 │ │ with self.trainer.profiler.profile("run_training_epoch"): │
│ ❱ 354 │ │ │ self.epoch_loop.run(self._data_fetcher) │
│ 355 │ │
│ 356 │ def on_advance_end(self) -> None: │
│ 357 │ │ trainer = self.trainer │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/training_epoch_loop.py:133 in run │
│ │
│ 130 │ │ self.on_run_start(data_fetcher) │
│ 131 │ │ while not self.done: │
│ 132 │ │ │ try: │
│ ❱ 133 │ │ │ │ self.advance(data_fetcher) │
│ 134 │ │ │ │ self.on_advance_end() │
│ 135 │ │ │ │ self._restarting = False │
│ 136 │ │ │ except StopIteration: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/training_epoch_loop.py:218 in advance │
│ │
│ 215 │ │ │ with trainer.profiler.profile("run_training_batch"): │
│ 216 │ │ │ │ if trainer.lightning_module.automatic_optimization: │
│ 217 │ │ │ │ │ # in automatic optimization, there can only be one optimizer │
│ ❱ 218 │ │ │ │ │ batch_output = self.automatic_optimization.run(trainer.optimize │
│ 219 │ │ │ │ else: │
│ 220 │ │ │ │ │ batch_output = self.manual_optimization.run(kwargs) │
│ 221 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:185 in run │
│ │
│ 182 │ │ # ------------------------------ │
│ 183 │ │ # gradient update with accumulated gradients │
│ 184 │ │ else: │
│ ❱ 185 │ │ │ self._optimizer_step(kwargs.get("batch_idx", 0), closure) │
│ 186 │ │ │
│ 187 │ │ result = closure.consume_result() │
│ 188 │ │ if result.loss is None: │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:261 in _optimizer_step │
│ │
│ 258 │ │ │ self.optim_progress.optimizer.step.increment_ready() │
│ 259 │ │ │
│ 260 │ │ # model hook │
│ ❱ 261 │ │ call._call_lightning_module_hook( │
│ 262 │ │ │ trainer, │
│ 263 │ │ │ "optimizer_step", │
│ 264 │ │ │ trainer.current_epoch, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:142 in _call_lightning_module_hook │
│ │
│ 139 │ pl_module._current_fx_name = hook_name │
│ 140 │ │
│ 141 │ with trainer.profiler.profile(f"[LightningModule]{pl_module.class.name} │
│ ❱ 142 │ │ output = fn(*args, **kwargs) │
│ 143 │ │
│ 144 │ # restore current_fx when nested context │
│ 145 │ pl_module._current_fx_name = prev_fx_name │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/co │
│ re/module.py:1265 in optimizer_step │
│ │
│ 1262 │ │ │ │ │ for pg in optimizer.param_groups: │
│ 1263 │ │ │ │ │ │ pg["lr"] = lr_scale * self.learning_rate │
│ 1264 │ │ """ │
│ ❱ 1265 │ │ optimizer.step(closure=optimizer_closure) │
│ 1266 │ │
│ 1267 │ def optimizer_zero_grad(self, epoch: int, batch_idx: int, optimizer: Optimizer │
│ 1268 │ │ """Override this method to change the default behaviour of optimizer.zer │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/co │ │ re/optimizer.py:158 in step │ │ │ │ 155 │ │ │ raise MisconfigurationException("When `optimizer.step(closure)` is call │ │ 156 │ │ │ │ 157 │ │ assert self._strategy is not None │ │ ❱ 158 │ │ step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwa │ │ 159 │ │ │ │ 160 │ │ self._on_after_step() │ │ 161 │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/st │ │ rategies/strategy.py:224 in optimizer_step │ │ │ │ 221 │ │ model = model or self.lightning_module │ │ 222 │ │ # TODO(fabric): remove assertion once strategy's optimizer_step typing is f │ │ 223 │ │ assert isinstance(model, pl.LightningModule) │ │ ❱ 224 │ │ return self.precision_plugin.optimizer_step(optimizer, model=model, closure │ │ 225 │ │ │ 226 │ def _setup_model_and_optimizers(self, model: Module, optimizers: List[Optimizer │ │ 227 │ │ """Setup a model and multiple optimizers together. │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/pl │ │ ugins/precision/precision_plugin.py:114 in optimizer_step │ │ │ │ 111 │ ) -> Any: │ │ 112 │ │ """Hook to run the optimizer step.""" │ │ 113 │ │ closure = partial(self._wrap_closure, model, optimizer, closure) │ │ ❱ 114 │ │ return optimizer.step(closure=closure, **kwargs) │ │ 115 │ │ │ 116 │ def _clip_gradients( │ │ 117 │ │ self, │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/lr_sched │ │ uler.py:69 in wrapper │ │ │ │ 66 │ │ │ │ instance = instance_ref() │ │ 67 │ │ │ │ instance._step_count += 1 │ │ 68 │ │ │ │ wrapped = func.__get__(instance, cls) │ │ ❱ 69 │ │ │ │ return wrapped(*args, **kwargs) │ │ 70 │ │ │ │ │ 71 │ │ │ # Note that the returned function here is no longer a bound method, │ │ 72 │ │ │ # so attributes like `__func__` and `__self__` no longer exist. │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/optimize │ │ r.py:280 in wrapper │ │ │ │ 277 │ │ │ │ │ │ │ raise RuntimeError(f"{func} must return None or a tuple │ │ 278 │ │ │ │ │ │ │ │ │ │ │ f"but got {result}.") │ │ 279 │ │ │ │ │ │ ❱ 280 │ │ │ │ out = func(*args, **kwargs) │ │ 281 │ │ │ │ self._optimizer_step_code() │ │ 282 │ │ │ │ │ │ 283 │ │ │ │ # call optimizer step post hooks │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/optimize │ │ r.py:33 in _use_grad │ │ │ │ 30 │ │ prev_grad = torch.is_grad_enabled() │ │ 31 │ │ try: │ │ 32 │ │ │ torch.set_grad_enabled(self.defaults['differentiable']) │ │ ❱ 33 │ │ │ ret = func(self, *args, **kwargs) │ │ 34 │ │ finally: │ │ 35 │ │ │ torch.set_grad_enabled(prev_grad) │ │ 36 │ │ return ret │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/optim/adamw.py │ │ :148 in step │ │ │ │ 145 │ │ loss = None │ │ 146 │ │ if closure is not None: │ │ 147 │ │ │ with torch.enable_grad(): │ │ ❱ 148 │ │ │ │ loss = closure() │ │ 149 │ │ │ │ 150 │ │ for group in self.param_groups: │ │ 151 │ │ │ params_with_grad = [] │ │ │ │ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/pl │ │ ugins/precision/precision_plugin.py:101 in _wrap_closure │ │ │ │ 98 │ │ The closure (generally) runs backwardso this allows inspecting gradien │ │ 99 │ │ consistent with thePrecisionPluginsubclasses that cannot passoptim │
│ 100 │ │ """ │
│ ❱ 101 │ │ closure_result = closure() │
│ 102 │ │ self._after_closure(model, optimizer) │
│ 103 │ │ return closure_result │
│ 104 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:140 in call
│ │
│ 137 │ │ return step_output │
│ 138 │ │
│ 139 │ def call(self, *args: Any, **kwargs: Any) -> Optional[Tensor]: │
│ ❱ 140 │ │ self._result = self.closure(*args, **kwargs) │
│ 141 │ │ return self._result.loss │
│ 142 │
│ 143 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:126 in closure │
│ │
│ 123 │ │ self._zero_grad_fn = zero_grad_fn │
│ 124 │ │
│ 125 │ def closure(self, *args: Any, **kwargs: Any) -> ClosureResult: │
│ ❱ 126 │ │ step_output = self._step_fn() │
│ 127 │ │ │
│ 128 │ │ if step_output.closure_loss is None: │
│ 129 │ │ │ self.warning_cache.warn("training_step returned None. If this was o │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/lo │
│ ops/optimization/automatic.py:308 in _training_step │
│ │
│ 305 │ │ trainer = self.trainer │
│ 306 │ │ │
│ 307 │ │ # manually capture logged metrics │
│ ❱ 308 │ │ training_step_output = call._call_strategy_hook(trainer, "training_step", * │
│ 309 │ │ self.trainer.strategy.post_training_step() │
│ 310 │ │ │
│ 311 │ │ result = self.output_result_cls.from_training_step_output(training_step_out │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/tr │
│ ainer/call.py:288 in _call_strategy_hook │
│ │
│ 285 │ │ return │
│ 286 │ │
│ 287 │ with trainer.profiler.profile(f"[Strategy]{trainer.strategy.class.name} │
│ ❱ 288 │ │ output = fn(*args, **kwargs) │
│ 289 │ │
│ 290 │ # restore current_fx when nested context │
│ 291 │ pl_module._current_fx_name = prev_fx_name │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/pytorch_lightning/st │
│ rategies/strategy.py:366 in training_step │
│ │
│ 363 │ │ """ │
│ 364 │ │ with self.precision_plugin.train_step_context(): │
│ 365 │ │ │ assert isinstance(self.model, TrainingStep) │
│ ❱ 366 │ │ │ return self.model.training_step(*args, **kwargs) │
│ 367 │ │
│ 368 │ def post_training_step(self) -> None: │
│ 369 │ │ pass │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/trainers/lig │
│ htning_trainer.py:76 in training_step │
│ │
│ 73 │ │ return self.train_dl │
│ 74 │ │
│ 75 │ def training_step(self, batch, batch_idx): │
│ ❱ 76 │ │ loss = self.model_engine.training_step(batch) │
│ 77 │ │ self.losses.append(loss.item()) │
│ 78 │ │ self.log("loss", loss.item(), prog_bar=True) │
│ 79 │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/caus │
│ al.py:89 in training_step │
│ │
│ 86 │ │ │ │ │ attention_mask=batch.get("attention_mask", None), │
│ 87 │ │ │ │ ) │
│ 88 │ │ else: │
│ ❱ 89 │ │ │ outputs = self.model( │
│ 90 │ │ │ │ input_ids=batch["input_ids"], │
│ 91 │ │ │ │ attention_mask=batch.get("attention_mask", None), │
│ 92 │ │ │ ) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:1059 in forward │
│ │
│ 1056 │ │ ) │
│ 1057 │ │ │
│ 1058 │ │ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec

│ ❱ 1059 │ │ outputs = self.model( │
│ 1060 │ │ │ input_ids=input_ids, │
│ 1061 │ │ │ attention_mask=attention_mask, │
│ 1062 │ │ │ past_key_values=past_key_values, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:897 in forward │
│ │
│ 894 │ │ │ │ │ None, │
│ 895 │ │ │ │ ) │
│ 896 │ │ │ else: │
│ ❱ 897 │ │ │ │ layer_outputs = decoder_layer( │
│ 898 │ │ │ │ │ hidden_states, │
│ 899 │ │ │ │ │ attention_mask=attention_mask, │
│ 900 │ │ │ │ │ past_key_value=past_key_value, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:651 in forward │
│ │
│ 648 │ │ hidden_states = self.input_layernorm(hidden_states) │
│ 649 │ │ │
│ 650 │ │ # Self Attention │
│ ❱ 651 │ │ hidden_states, self_attn_weights, present_key_value = self.self_attn( │
│ 652 │ │ │ hidden_states=hidden_states, │
│ 653 │ │ │ past_key_value=past_key_value, │
│ 654 │ │ │ attention_mask=attention_mask, │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/llam │
│ a_utils/llama.py:528 in forward │
│ │
│ 525 │ │ bsz, q_len, _ = hidden_states.size() │
│ 526 │ │ │
│ 527 │ │ query_states = ( │
│ ❱ 528 │ │ │ self.q_proj(hidden_states) │
│ 529 │ │ │ .view(bsz, q_len, self.num_heads, self.head_dim) │
│ 530 │ │ │ .transpose(1, 2) │
│ 531 │ │ ) │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/torch/nn/modules/mod │
│ ule.py:1501 in _call_impl │
│ │
│ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self.forward
│ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │
│ 1502 │ │ # Do not call functions when jit is used │
│ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1504 │ │ backward_pre_hooks = [] │
│ │
│ /Users/christopher.beckett/opt/anaconda3/lib/python3.8/site-packages/xturing/engines/lora │
engine/lora.py:570 in forward │
│ │
│ 567 │ │ │ │
│ 568 │ │ │ return F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=s │
│ 569 │ │ elif self.r > 0 and not self.merged: │
│ ❱ 570 │ │ │ result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias │
│ 571 │ │ │ if self.r > 0: │
│ 572 │ │ │ │ loraoutput = self.lora_B(self.lora_A(self.lora_dropout(x))) * self │
│ 573 │ │ │ │ result = result + loraoutput │
╰───────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: "addmm_impl_cpu
" not implemented for 'Half'
`

@StochasticRomanAgeev
Copy link
Contributor

StochasticRomanAgeev commented Jul 12, 2023

Hi @chriskuchar,
We released version with int4 support for many different models, now you can tune models even on small gpu.
If you still want cpu-only, try please just llama model type, we are working on testing llama_lora in cpu-only mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants