Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on Fedora 38 - Docs Update #39

Open
mrhillsman opened this issue Jul 22, 2023 · 3 comments
Open

Running on Fedora 38 - Docs Update #39

mrhillsman opened this issue Jul 22, 2023 · 3 comments

Comments

@mrhillsman
Copy link

When wanting self-hosted we are told to visit https://refact.ai/docs/self-hosting/ and run docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting after ensuring we have docker with nvidia gpu support. Unfortunately these instructions do not work for me while I was able to run the previous release of refact before the significant changes just made. Here is what I was getting when following those instructions:

-- 26 -- WARNING:root:output was:
-- 26 -- - no output -
-- 26 -- WARNING:root:nvidia-smi does not work, that's especially bad for initial setup.
-- 26 -- WARNING:root:Traceback (most recent call last):
-- 26 --   File "/usr/local/lib/python3.8/dist-packages/self_hosting_machinery/scripts/enum_gpus.py", line 17, in query_nvidia_smi
-- 26 --     nvidia_smi_output = subprocess.check_output([
-- 26 --   File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
-- 26 --     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
-- 26 --   File "/usr/lib/python3.8/subprocess.py", line 516, in run
-- 26 --     raise CalledProcessError(retcode, process.args,
-- 26 -- subprocess.CalledProcessError: Command '['nvidia-smi', '--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu', '--format=csv']' returned non-zero exit status 4.
-- 26 -- 

I can confirm however that running the enum_gpus.py by importing into python (tested 3.8 - which is in the Dockerfile, and 3.11) the function query_nvidia_smi succeeds. Additionally running the nvidia-smi command and flags from enum_gpus succeed:

(refact) [mrhillsman@workstation refact]$ python --version
Python 3.8.17
(refact) [mrhillsman@workstation refact]$ python
Python 3.8.17 (default, Jun  8 2023, 00:00:00) 
[GCC 13.1.1 20230511 (Red Hat 13.1.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> subprocess.check_output(["nvidia-smi", "--query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu", "--format=csv"])
b'pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu\n00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29\n'
>>> import self_hosting_machinery.scripts.enum_gpus as gpuenum
>>> gpuenum.query_nvidia_smi()
{'gpus': [{'id': '00000000:01:00.0', 'name': 'NVIDIA GeForce RTX 3080', 'mem_used_mb': 11, 'mem_total_mb': 10240, 'temp_celsius': 29}]}
>>> exit()
(refact) [mrhillsman@workstation refact]$ nvidia-smi --query-gpu=pci.bus_id,name,memory.used,memory.total,temperature.gpu --format=csv
pci.bus_id, name, memory.used [MiB], memory.total [MiB], temperature.gpu
00000000:01:00.0, NVIDIA GeForce RTX 3080, 11 MiB, 10240 MiB, 29
(refact) [mrhillsman@workstation refact]$ nvidia-smi 
Sat Jul 22 15:46:59 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080        Off | 00000000:01:00.0 Off |                  N/A |
|  0%   29C    P8              13W / 370W |     11MiB / 10240MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2727      G   /usr/bin/gnome-shell                          3MiB |
+---------------------------------------------------------------------------------------+
(refact) [mrhillsman@workstation refact]$ uname -a
Linux workstation 6.3.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul  6 04:05:18 UTC 2023 x86_64 GNU/Linux
(refact) [mrhillsman@workstation refact]$ cat /etc/os-release 
NAME="Fedora Linux"
VERSION="38 (Workstation Edition)"
ID=fedora
VERSION_ID=38
VERSION_CODENAME=""
PLATFORM_ID="platform:f38"
PRETTY_NAME="Fedora Linux 38 (Workstation Edition)"
ANSI_COLOR="0;38;2;60;110;180"
LOGO=fedora-logo-icon
CPE_NAME="cpe:/o:fedoraproject:fedora:38"
DEFAULT_HOSTNAME="fedora"
HOME_URL="https://fedoraproject.org/"
DOCUMENTATION_URL="https://docs.fedoraproject.org/en-US/fedora/f38/system-administrators-guide/"
SUPPORT_URL="https://ask.fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=38
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=38
SUPPORT_END=2024-05-14
VARIANT="Workstation Edition"
VARIANT_ID=workstation
[mrhillsman@workstation refact-ai]$ sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Memory protection checking:     actual (secure)
Max kernel policy version:      33

I would have created a PR for the documentation change but I do not see a repo for the site documentation. Here is what I was able to run and have work which I am recommending be added to the documentation somehow either under Fedora38 specifically or RPM based OSs in general:

podman run -d -it --gpus 0 --security-opt=label=disable -p 8008:8008 -v perm_storage:/perm_storage smallcloud/refact_self_hosting

@olegklimov
Copy link
Contributor

Thanks for reporting!

@olegklimov
Copy link
Contributor

We have docs repository

https://github.com/smallcloudai/web_docs_refact_ai

@mrhillsman
Copy link
Author

mrhillsman commented Feb 21, 2024

thx @olegklimov will submit a PR soon there apologies for the delay. once i get an open PR/issue there i'll reference here and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog 🤔
Development

No branches or pull requests

2 participants