You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When wanting self-hosted we are told to visit https://refact.ai/docs/self-hosting/ and run docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting after ensuring we have docker with nvidia gpu support. Unfortunately these instructions do not work for me while I was able to run the previous release of refact before the significant changes just made. Here is what I was getting when following those instructions:
-- 26 -- WARNING:root:output was:
-- 26 -- - no output -
-- 26 -- WARNING:root:nvidia-smi does not work, that's especially bad for initial setup.
-- 26 -- WARNING:root:Traceback (most recent call last):
-- 26 -- File "/usr/local/lib/python3.8/dist-packages/self_hosting_machinery/scripts/enum_gpus.py", line 17, in query_nvidia_smi
-- 26 -- nvidia_smi_output = subprocess.check_output([
-- 26 -- File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
-- 26 -- return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
-- 26 -- File "/usr/lib/python3.8/subprocess.py", line 516, in run
-- 26 -- raise CalledProcessError(retcode, process.args,
-- 26 -- subprocess.CalledProcessError: Command '['nvidia-smi', '--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu', '--format=csv']' returned non-zero exit status 4.
-- 26 --
I can confirm however that running the enum_gpus.py by importing into python (tested 3.8 - which is in the Dockerfile, and 3.11) the function query_nvidia_smi succeeds. Additionally running the nvidia-smi command and flags from enum_gpus succeed:
(refact) [mrhillsman@workstation refact]$ nvidia-smi --query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu --format=csv
pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu
00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29
(refact) [mrhillsman@workstation refact]$ nvidia-smi
Sat Jul 22 15:46:59 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 ||-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |||| MIG M. ||=========================================+======================+======================|| 0 NVIDIA GeForce RTX 3080 Off | 00000000:01:00.0 Off | N/A || 0% 29C P8 13W / 370W | 11MiB / 10240MiB | 0% Default |||| N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=======================================================================================|| 0 N/A N/A 2727 G /usr/bin/gnome-shell 3MiB |
+---------------------------------------------------------------------------------------+
(refact) [mrhillsman@workstation refact]$ uname -a
Linux workstation 6.3.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 6 04:05:18 UTC 2023 x86_64 GNU/Linux
(refact) [mrhillsman@workstation refact]$ cat /etc/os-release
NAME="Fedora Linux"
VERSION="38 (Workstation Edition)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Workstation Edition"
VARIANT_ID=workstation
[mrhillsman@workstation refact-ai]$ sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 33
I would have created a PR for the documentation change but I do not see a repo for the site documentation. Here is what I was able to run and have work which I am recommending be added to the documentation somehow either under Fedora38 specifically or RPM based OSs in general:
When wanting self-hosted we are told to visit https://refact.ai/docs/self-hosting/ and run docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting after ensuring we have docker with nvidia gpu support. Unfortunately these instructions do not work for me while I was able to run the previous release of refact before the significant changes just made. Here is what I was getting when following those instructions:
I can confirm however that running the enum_gpus.py by importing into python (tested 3.8 - which is in the Dockerfile, and 3.11) the function query_nvidia_smi succeeds. Additionally running the nvidia-smi command and flags from enum_gpus succeed:
I would have created a PR for the documentation change but I do not see a repo for the site documentation. Here is what I was able to run and have work which I am recommending be added to the documentation somehow either under Fedora38 specifically or RPM based OSs in general:
podman run -d -it --gpus 0 --security-opt=label=disable -p 8008:8008 -v perm_storage:/perm_storage smallcloud/refact_self_hosting
The text was updated successfully, but these errors were encountered: