We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from xturing.datasets.text_dataset import TextDataset dataset = TextDataset({ "text": sample["text"], "target": sample["target"] }) # >>> [2023-10-04 11:09:41,704] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
from xturing.models import BaseModel # Load the model model = BaseModel.create('llama2_lora_int8')
finetuning_config = model.finetuning_config() finetuning_config.num_train_epochs = 1 finetuning_config.learning_rate = 1e-3 finetuning_config.weight_decay = 0.01 finetuning_config.eval_steps = 200 finetuning_config.save_steps = 1000 finetuning_config.logging_steps = 200
model.finetune(dataset=dataset)
Using 16bit Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs trainable params: 4194304 || all params: 6742609920 || trainable%: 0.06220594176090199 You are using a CUDA device ('NVIDIA RTX A5000') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py310_cu118/cpu_adam... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__SCALAR__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o FAILED: cpu_adam.o c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/local/cuda/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__SCALAR__ -D__ENABLE_CUDA__ -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o In file included from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/Device.h:4, from /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:8, from /usr/local/lib/python3.10/dist-packages/torch/include/torch/extension.h:6, from /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:7: /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/python_headers.h:12:10: fatal error: Python.h: No such file or directory 12 | #include <Python.h> | ^~~~~~~~~~ compilation terminated. [2/3] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -DBF16_AVAILABLE -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o ninja: build stopped: subcommand failed.
And then...
CalledProcessError Traceback (most recent call last) --- ommited --- 1622 if verbose: 1623 print(f'Building extension module {name}...', file=sys.stderr) -> 1624 _run_ninja_build( 1625 build_directory, 1626 verbose, 1627 error_prefix=f"Error building extension '{name}'") File /usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:1911, in _run_ninja_build(build_directory, verbose, error_prefix) 1909 if hasattr(error, 'output') and error.output: # type: ignore[union-attr] 1910 message += f": {error.output.decode(*SUBPROCESS_DECODE_ARGS)}" # type: ignore[union-attr] -> 1911 raise RuntimeError(message) from e RuntimeError: Error building extension 'cpu_adam'
Any solution to this? I am only using a single GPU.
The text was updated successfully, but these errors were encountered:
Hi @AayushSameerShah, You need to install python3-dev to use this libraries. Here is the link to next installation steps.
Sorry, something went wrong.
No branches or pull requests
Loading dataset
Loading model
Simple configuration
Training
🔴 Error
And then...
Any solution to this? I am only using a single GPU.
The text was updated successfully, but these errors were encountered: