Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

關於nvidia driver not found #6

Open
blackline0911 opened this issue May 4, 2024 · 3 comments
Open

關於nvidia driver not found #6

blackline0911 opened this issue May 4, 2024 · 3 comments

Comments

@blackline0911
Copy link

主辦單位與各位參賽者們好:
我因為電腦使用AMD Radeon™ RX 6650 XT顯示卡,所以在執行baseline reid training指令時出現以下錯誤:

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

因為手邊沒有裝nvidia卡且記憶體足夠的電腦,所以想請問一下這樣有解決辦法嗎?
另外我是用win11的wsl (Windows Subsystem for Linux) 的linux系統。而且就算pip改安裝faiss-cpu而不是faiss-gpu,打算用cpu跑也會遇到相同的錯誤。
先前conda、pip install package 安裝步驟與baseline相同。
這邊是我的電腦硬體配置
image
image

指令就是python3 fast_reid/tools/train_net.py --config-file fast_reid/configs/AICUP/bagtricks_R50-ibn.yml MODEL.DEVICE "cuda:0"

詳細output如下:
Command Line Args: Namespace(config_file='fast_reid/configs/AICUP/bagtricks_R50-ibn.yml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:0'], resume=False)
[05/04 23:09:40 fastreid]: Rank of current process: 0. World size: 1
[05/04 23:09:40 fastreid]: Environment info:


sys.platform linux
Python 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
numpy 1.21.6
fastreid failed to import
FASTREID_ENV_MODULE
PyTorch 1.13.1+cu117 @/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available False
Pillow 9.5.0
torchvision 0.14.1+cu117 @/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torchvision
cv2 4.9.0


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[05/04 23:09:40 fastreid]: Command line arguments: Namespace(config_file='fast_reid/configs/AICUP/bagtricks_R50-ibn.yml', dist_url='tcp://127.0.0.1:50152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=['MODEL.DEVICE', 'cuda:0'], resume=False)
[05/04 23:09:40 fastreid]: Contents of args.config_file=fast_reid/configs/AICUP/bagtricks_R50-ibn.yml:
BASE: ../Base-bagtricks.yml

INPUT:
SIZE_TRAIN: [256, 256]
SIZE_TEST: [256, 256]

MODEL:
BACKBONE:
WITH_IBN: True
HEADS:
POOL_LAYER: GeneralizedMeanPooling

LOSSES:
TRI:
HARD_MINING: False
MARGIN: 0.0

DATASETS:
NAMES: ("AICUP",)
TESTS: ("AICUP",)

SOLVER:
BIAS_LR_FACTOR: 1.

IMS_PER_BATCH: 256
MAX_EPOCH: 60
STEPS: [30, 50]
WARMUP_ITERS: 2000

CHECKPOINT_PERIOD: 1

TEST:
EVAL_PERIOD: 60 # We didn't provide eval dataset
IMS_PER_BATCH: 256

OUTPUT_DIR: logs/AICUP_115/bagtricks_R50-ibn

[05/04 23:09:40 fastreid]: Running with full config:
CUDNN_BENCHMARK: True
DATALOADER:
NUM_INSTANCE: 4
NUM_WORKERS: 8
SAMPLER_TRAIN: NaiveIdentitySampler
SET_WEIGHT: []
DATASETS:
COMBINEALL: False
NAMES: ('AICUP',)
TESTS: ('AICUP',)
INPUT:
AFFINE:
ENABLED: False
AUGMIX:
ENABLED: False
PROB: 0.0
AUTOAUG:
ENABLED: False
PROB: 0.0
CJ:
BRIGHTNESS: 0.15
CONTRAST: 0.15
ENABLED: False
HUE: 0.1
PROB: 0.5
SATURATION: 0.1
CROP:
ENABLED: False
RATIO: [0.75, 1.3333333333333333]
SCALE: [0.16, 1]
SIZE: [224, 224]
FLIP:
ENABLED: True
PROB: 0.5
PADDING:
ENABLED: True
MODE: constant
SIZE: 10
REA:
ENABLED: True
PROB: 0.5
VALUE: [123.675, 116.28, 103.53]
RPT:
ENABLED: False
PROB: 0.5
SIZE_TEST: [256, 256]
SIZE_TRAIN: [256, 256]
KD:
EMA:
ENABLED: False
MOMENTUM: 0.999
MODEL_CONFIG: []
MODEL_WEIGHTS: []
MODEL:
BACKBONE:
ATT_DROP_RATE: 0.0
DEPTH: 50x
DROP_PATH_RATIO: 0.1
DROP_RATIO: 0.0
FEAT_DIM: 2048
LAST_STRIDE: 1
NAME: build_resnet_backbone
NORM: BN
PRETRAIN: True
PRETRAIN_PATH:
SIE_COE: 3.0
STRIDE_SIZE: (16, 16)
WITH_IBN: True
WITH_NL: False
WITH_SE: False
DEVICE: cuda:0
FREEZE_LAYERS: []
HEADS:
CLS_LAYER: Linear
EMBEDDING_DIM: 0
MARGIN: 0.0
NAME: EmbeddingHead
NECK_FEAT: before
NORM: BN
NUM_CLASSES: 0
POOL_LAYER: GeneralizedMeanPooling
SCALE: 1
WITH_BNNECK: True
LOSSES:
CE:
ALPHA: 0.2
EPSILON: 0.1
SCALE: 1.0
CIRCLE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
COSFACE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
FL:
ALPHA: 0.25
GAMMA: 2
SCALE: 1.0
NAME: ('CrossEntropyLoss', 'TripletLoss')
TRI:
HARD_MINING: False
MARGIN: 0.0
NORM_FEAT: False
SCALE: 1.0
META_ARCHITECTURE: Baseline
PIXEL_MEAN: [123.675, 116.28, 103.53]
PIXEL_STD: [58.395, 57.120000000000005, 57.375]
QUEUE_SIZE: 8192
WEIGHTS:
OUTPUT_DIR: logs/AICUP_115/bagtricks_R50-ibn
SOLVER:
AMP:
ENABLED: True
BASE_LR: 0.00035
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 1
CLIP_GRADIENTS:
CLIP_TYPE: norm
CLIP_VALUE: 5.0
ENABLED: False
NORM_TYPE: 2.0
DELAY_EPOCHS: 0
ETA_MIN_LR: 1e-07
FREEZE_ITERS: 0
GAMMA: 0.1
HEADS_LR_FACTOR: 1.0
IMS_PER_BATCH: 256
MAX_EPOCH: 60
MOMENTUM: 0.9
NESTEROV: False
OPT: Adam
SCHED: MultiStepLR
STEPS: [30, 50]
WARMUP_FACTOR: 0.1
WARMUP_ITERS: 2000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0005
WEIGHT_DECAY_BIAS: 0.0005
WEIGHT_DECAY_NORM: 0.0005
TEST:
AQE:
ALPHA: 3.0
ENABLED: False
QE_K: 5
QE_TIME: 1
EVAL_PERIOD: 60
FLIP:
ENABLED: False
IMS_PER_BATCH: 256
METRIC: cosine
PRECISE_BN:
DATASET: Market1501
ENABLED: False
NUM_ITER: 300
RERANK:
ENABLED: False
K1: 20
K2: 6
LAMBDA: 0.3
ROC:
ENABLED: False
[05/04 23:09:40 fastreid]: Full config saved to /mnt/c/Users/kevin/Desktop/ai_cup/AICUP_Baseline_BoT-SORT/logs/AICUP_115/bagtricks_R50-ibn/config.yaml
/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torchvision/transforms/transforms.py:330: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
"Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. "
Traceback (most recent call last):
File "fast_reid/tools/train_net.py", line 60, in
args=(args,),
File "./fast_reid/fastreid/engine/launch.py", line 71, in launch
main_func(*args)
File "fast_reid/tools/train_net.py", line 43, in main
trainer = DefaultTrainer(cfg)
File "./fast_reid/fastreid/engine/defaults.py", line 203, in init
data_loader = self.build_train_loader(cfg)
File "./fast_reid/fastreid/engine/defaults.py", line 402, in build_train_loader
return build_reid_train_loader(cfg, combineall=cfg.DATASETS.COMBINEALL)
File "./fast_reid/fastreid/config/config.py", line 265, in wrapped
return orig_func(**explicit_args)
File "./fast_reid/fastreid/data/build.py", line 98, in build_reid_train_loader
pin_memory=True,
File "./fast_reid/fastreid/data/data_utils.py", line 152, in init
local_rank
File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/streams.py", line 36, in new
with torch.cuda.device(device):
File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/init.py", line 287, in enter
self.prev_idx = torch.cuda.current_device()
File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/init.py", line 552, in current_device
_lazy_init()
File "/home/kevin/anaconda3/envs/bot/lib/python3.7/site-packages/torch/cuda/init.py", line 229, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

@MuennL
Copy link

MuennL commented May 6, 2024

nvidia driver not found 就是 nvidia driver沒有安裝,可以查詢官方手冊看看所使用的gpu所對應的driver version。

@ricky-696
Copy link
Owner

ricky-696 commented May 9, 2024

兄弟~

你用AMD的顯卡,怎麼會是裝NVIDIA的Driver呢

我自己對AMD不熟,但Pytorch有支援AMD的版本,好像要額外安裝ROCm相關套件等等,這個Blog有詳細說明,你再試試看吧,加油~

@aquastripe
Copy link

NVIDIA 的 CUDA = AMD 的 ROCm
你的顯卡剛好有支援 ROCm:
https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html
可以參考官網如何安裝 PyTorch+ROCm:
https://rocm.docs.amd.com/projects/install-on-linux/en/develop/how-to/3rd-party/pytorch-install.html

要注意的是目前 PyTorch+ROCm 不支援 Windows,以及你的顯卡可以用 ROCm runtime 但不支援整個 SDK。

不要用 CPU 跑,你可以試試 Colab。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants