RuntimeError: Error building extension 'cpu_adam' #260

AayushSameerShah · 2023-10-04T11:26:50Z

Loading dataset

from xturing.datasets.text_dataset import TextDataset

dataset = TextDataset({
    "text": sample["text"],
    "target": sample["target"]
})
# >>> [2023-10-04 11:09:41,704] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)

Loading model

from xturing.models import BaseModel

# Load the model
model = BaseModel.create('llama2_lora_int8')

Simple configuration

finetuning_config = model.finetuning_config()

finetuning_config.num_train_epochs = 1
finetuning_config.learning_rate = 1e-3
finetuning_config.weight_decay = 0.01
finetuning_config.eval_steps = 200
finetuning_config.save_steps = 1000
finetuning_config.logging_steps = 200

Training

model.finetune(dataset=dataset)

🔴 Error

Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199
You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py310_cu118/cpu_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)

[1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__SCALAR__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
FAILED: cpu_adam.o 
c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__SCALAR__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
In file included from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/Device.h:4,
                 from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                 from /usr/local/lib/python3.10/dist-packages/torch/include/torch/extension.h:6,
                 from /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:7:
/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory
   12 | #include <Python.h>
      |          ^~~~~~~~~~
compilation terminated.
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
ninja: build stopped: subcommand failed.

And then...

CalledProcessError                        Traceback (most recent call last)
--- ommited ---

   1622 if verbose:
   1623     print(f'Building extension module {name}...', file=sys.stderr)
-> 1624 _run_ninja_build(
   1625     build_directory,
   1626     verbose,
   1627     error_prefix=f"Error building extension '{name}'")

File /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1911, in _run_ninja_build(build_directory, verbose, error_prefix)
   1909 if hasattr(error, 'output') and error.output:  # type: ignore[union-attr]
   1910     message += f": {error.output.decode(*SUBPROCESS_DECODE_ARGS)}"  # type: ignore[union-attr]
-> 1911 raise RuntimeError(message) from e

RuntimeError: Error building extension 'cpu_adam'

Any solution to this? I am only using a single GPU.

The text was updated successfully, but these errors were encountered:

StochasticRomanAgeev · 2023-10-30T08:12:11Z

Hi @AayushSameerShah,
You need to install python3-dev to use this libraries.
Here is the link to next installation steps.

StochasticRomanAgeev closed this as completed Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Error building extension 'cpu_adam' #260

RuntimeError: Error building extension 'cpu_adam' #260

AayushSameerShah commented Oct 4, 2023

StochasticRomanAgeev commented Oct 30, 2023 •

edited

Loading

RuntimeError: Error building extension 'cpu_adam' #260

RuntimeError: Error building extension 'cpu_adam' #260

Comments

AayushSameerShah commented Oct 4, 2023

Loading dataset

Loading model

Simple configuration

Training

🔴 Error

StochasticRomanAgeev commented Oct 30, 2023 • edited Loading

StochasticRomanAgeev commented Oct 30, 2023 •

edited

Loading