Stable Diffusion / ROCm / PyTorch Setup

Hi!

I’m trying to get Stable Diffusion running on my FW16 (with the 7700S), but I’m having some trouble. I’ve tried to follow this guide (Installing ROCm / HIPLIB on Ubuntu 22.04 - #2 by cepth), using ROCm 5.7 on Ubuntu 22.04, but whenever I try running something CUDA related I get RuntimeError: No HIP GPUs are available. I’m a bit new to CUDA/torch/ML in general and so I’m not familiar with the details.

It doesn’t seem that gfx1102 is officially supported in this (or any?) version of ROCm, but I was wondering if there were any unofficial workarounds or if anyone had managed to sort it out on their own machine.

Thanks!

Can you try installing ROCm 6.x? And, are your environment variables properly set up?

I’ve run Stable Diffusion (A1111) on the 7700S with no problems. What SD implementation are you trying to install? If you’re using A1111, be sure to follow the ROCm specific instructions.

I’ll try ROCm 6.1 now (though it wasn’t working earlier for me).

I have a script to set some environment variables, and it currently looks like this (I use fish):

set -x HSA_OVERRIDE_GFX_VERSION 11.0.2
set -x HCC_AMDGPU_TARGET gfx1102
set -x PYTORCH_ROCM_ARCH gfx1102
set -x AMDGPU_TARGETS gfx1102
set -x TRITON_USE_ROCM ON

set -x ROCM_PATH /opt/rocm-5.7.0
set -x ROCR_VISIBLE_DEVICES 0
set -x HIP_VISIBLE_DEVICES 0
set -x USE_CUDA 0

I am trying to use the A1111 version and I’m (trying to) use the ROCm-specific instructions (under the Running natively header).

I’ve got it to work with setting the device id to 1100 instead of 1103

In my guide, I set all the device related versions to 11.0.0/gfx1100 (see step 7).

As you noted, there’s no official support in ROCm for any consumer cards except the RX 7900 XTX/XT (which are gfx1100).

It’s just not going to work if you set it to 11.0.2/gfx1102.

Ah cool! Hadn’t put two and two together and realised you wrote that guide. Thanks :slight_smile:

I’ve also tried ROCm 6.1 now, as well as using 1100 /11.0.0 for all the env vars, but still no luck as of yet. I’ve noticed that by the looks of it, clinfo is showing 0 devices, so maybe that’s where the problem is: I’ll check tomorrow.

When you run rocm-smi, does anything come up?

Additionally, if you’re installing/switching between versions of ROCm be sure to reboot after installation.

I’ve tried with ROCm 6.1(.0) and the following env vars:

set -x HSA_OVERRIDE_GFX_VERSION 11.0.0
set -x HCC_AMDGPU_TARGET gfx1100
set -x PYTORCH_ROCM_ARCH gfx1100
set -x AMDGPU_TARGETS gfx1100
set -x TRITON_USE_ROCM ON

set -x ROCM_PATH /opt/rocm-6.1.0
set -x ROCR_VISIBLE_DEVICES 0
set -x HIP_VISIBLE_DEVICES 0
set -x USE_CUDA 0

I’ve been (mostly) following the instructions in the README for A1111, except instead of the final command I’m installing torch in the venv manually from the ROCm repo with pip3 install torch==2.1.2 torchvision==0.16.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.1/, and then running launch.py with the arguments suggested. I’m still getting errors that indicate that my GPU isn’t being detected somehow.

When I run rocm-smi, I get:

=========================================== ROCm System Management Interface ===========================================
===================================================== Concise Info =====================================================
Device  [Model : Revision]    Temp    Power    Partitions      SCLK    MCLK     Fan    Perf  PwrCap       VRAM%  GPU%
        Name (20 chars)       (Edge)  (Avg)    (Mem, Compute)                                                       
========================================================================================================================
0       [0x0007 : 0xc1]       33.0°C  24.0W    N/A, N/A        803Mhz  96Mhz    29.8%  auto  100.0W         0%   0% 
        0x7480                                                                                                      
1       [0x0005 : 0xc1]       35.0°C  21.048W  N/A, N/A        None    1000Mhz  0%     auto  Unsupported   83%   5% 
        0x15bf                                                                                                      
========================================================================================================================
================================================= End of ROCm SMI Log ==================================================

Couple of questions:

  1. When you installed ROCm, you’re sure you specified the correct use cases? I.e. sudo amdgpu-install --usecase=graphics,rocm,hip,mllib. Are you using the amdgpu-install method, or another one?
  2. I’m also a Fish shell user, but A1111’s default launcher is going to use the Bash shell. Note how the first line (aka the “shebang line”) specifies Bash shell. You’re going to have to add these environment variables to your .bashrc file, because it’s the Bash shell that’s actually executing the launch script.
  3. If you get the chance, please run that PyTorch benchmarking suite I mentioned.
  1. Yes, I used exactly amdgpu-install --usecase=graphics,rocm,hip,mllib and rebooted afterwards.
  2. With set -x the environment variables should propagate to child processes anyway, but I added to the bashrc to be sure, ran again and the same error occurred.
  3. I got RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx. In general, I seem to be getting errors whenever torch._C._cuda_init() is run.

Thanks for your help so far, it’s much appreciated :slight_smile:

With set -x the environment variables should propagate to child processes anyway, but I added to the bashrc to be sure, ran again and the same error occurred.

I’m not sure this is the case. If you run the env command in Fish shell, and then explicitly run env in Bash shell, you’ll find the outputs differ.

After you added the env variables to .bashrc, did you run source ~/.bashrc? More generally, could you confirm that running env from within Bash shell produces the relevant env variables?


Are you sure that you’re installing the ROCm version of PyTorch within the venv that A1111 uses? To confirm (assuming you’re using Fish):

  1. Navigate to the directory where you’ve cloned A1111
  2. Activate the venv with source venv/bin/activate.fish
  3. Run python3 -m pip list

Can you post the output?

More generally, could you confirm that running env from within Bash shell produces the relevant env variables?

When I spawn a bash shell from my fish shell (by typing bash in fish), then the env variables are all available.


The output I get from python3 -m pip list is:

Package                   Version
------------------------- ------------
accelerate                0.21.0
aenum                     3.1.15
aiofiles                  23.2.1
aiohttp                   3.9.5
aiosignal                 1.3.1
altair                    5.3.0
antlr4-python3-runtime    4.9.3
anyio                     3.7.1
async-timeout             4.0.3
attrs                     23.2.0
blendmodes                2022
certifi                   2024.6.2
charset-normalizer        3.3.2
clean-fid                 0.1.35
click                     8.1.7
clip                      1.0
contourpy                 1.2.1
cycler                    0.12.1
deprecation               2.1.0
diskcache                 5.6.3
einops                    0.4.1
exceptiongroup            1.2.1
facexlib                  0.3.0
fastapi                   0.94.0
ffmpy                     0.3.2
filelock                  3.15.4
filterpy                  1.4.5
fonttools                 4.53.0
frozenlist                1.4.1
fsspec                    2024.6.1
ftfy                      6.2.0
gitdb                     4.0.11
GitPython                 3.1.32
gradio                    3.41.2
gradio_client             0.5.0
h11                       0.12.0
httpcore                  0.15.0
httpx                     0.24.1
huggingface-hub           0.23.4
idna                      3.7
imageio                   2.34.2
importlib_resources       6.4.0
inflection                0.5.1
Jinja2                    3.1.4
jsonmerge                 1.8.0
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
kiwisolver                1.4.5
kornia                    0.6.7
lark                      1.1.2
lazy_loader               0.4
lightning-utilities       0.11.3.post0
llvmlite                  0.43.0
MarkupSafe                2.1.5
matplotlib                3.9.0
mpmath                    1.3.0
multidict                 6.0.5
networkx                  3.3
numba                     0.60.0
numpy                     1.26.2
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.20.5
nvidia-nvjitlink-cu12     12.5.40
nvidia-nvtx-cu12          12.1.105
omegaconf                 2.2.3
open-clip-torch           2.20.0
opencv-python             4.10.0.84
orjson                    3.10.5
packaging                 24.1
pandas                    2.2.2
piexif                    1.1.3
Pillow                    9.5.0
pillow-avif-plugin        1.4.3
pip                       22.0.2
protobuf                  3.20.0
psutil                    5.9.5
pydantic                  1.10.17
pydub                     0.25.1
pyparsing                 3.1.2
python-dateutil           2.9.0.post0
python-multipart          0.0.9
pytorch-lightning         1.9.4
pytz                      2024.1
PyWavelets                1.6.0
PyYAML                    6.0.1
referencing               0.35.1
regex                     2024.5.15
requests                  2.32.3
resize-right              0.0.2
rpds-py                   0.18.1
safetensors               0.4.2
scikit-image              0.21.0
scipy                     1.14.0
semantic-version          2.10.0
sentencepiece             0.2.0
setuptools                69.5.1
six                       1.16.0
smmap                     5.0.1
sniffio                   1.3.1
spandrel                  0.1.6
starlette                 0.26.1
sympy                     1.12.1
tifffile                  2024.6.18
timm                      1.0.7
tokenizers                0.13.3
tomesd                    0.1.3
toolz                     0.12.1
torch                     2.3.1
torchdiffeq               0.2.3
torchmetrics              1.4.0.post0
torchsde                  0.2.6
torchvision               0.18.1
tqdm                      4.66.4
trampoline                0.1.2
transformers              4.30.2
triton                    2.3.1
typing_extensions         4.12.2
tzdata                    2024.1
urllib3                   2.2.2
uvicorn                   0.30.1
wcwidth                   0.2.13
websockets                11.0.3
yarl                      1.9.4

I did end up running the webui with --skip-torch-cuda-test just to check whether that was working, and I think at least a few of the packages were installed then.

Right, I’ve tried again from scratch with ROCm 6.0.0, updating the env vars appropriately, making sure to reboot after reinstallation, etc.

Now when I run python3 -m pip list, I get:

Package             Version
------------------- --------------
filelock            3.13.1
fsspec              2024.2.0
Jinja2              3.1.3
MarkupSafe          2.1.5
mpmath              1.3.0
networkx            3.2.1
numpy               1.26.3
pillow              10.2.0
pip                 22.0.2
pytorch-triton-rocm 2.3.1
setuptools          59.6.0
sympy               1.12
torch               2.3.1+rocm6.0
torchaudio          2.3.1+rocm6.0
torchvision         0.18.1+rocm6.0
typing_extensions   4.9.0

which has a pytorch-triton-rocm and torch=2.3.1+rocm6.0 in, but the problem persists.

So I think I have a sense of what’s going wrong.

The first pip list output you sent was with the venv activated. That one shows the wrong (non-ROCm) version of PyTorch (listed as torch) though.

The second pip list output has the correct torch version, but given the lack of any of the normal packages installed with A1111 (like gradio, which provides the UI), I’m guessing the second pip list was run with no venv activated?

When the venv is activated, your shell (in Fish) will show:

(venv) username@machine-name ...

Try activating the venv from within the A1111 directory, and then install the ROCm versions of torch and torchvision.

I should probably provide a final update on this (for those who find this on Google when trying to solve a similar problem…).

The problem was that I didn’t have ROCm installed properly. Make sure to restart between installation changes, make sure that the versions are all consistent, and it should be fine. I now have everything working with 6.0 and can run A1111 and ComfyUI with no problems.

Thank you for all your help cepth!

Glad to hear it!

ROCm 6.2 (just released) officially supports Ubuntu 24.04 now as well.

Can anybody shed a light on this? I’m having similar issues with pytorch

The python script from AMD ROCm for local training and inferencing - #3 by Spirosbond fails on torch.cuda.is_available() when I run it:

$ poetry run python3 checkrocm.py

Checking ROCM support...
GOOD: ROCM devices found: 2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user gianpa is in `render` and `video` groups.
BAD: PyTorch ROCM support NOT found.

I installed pytorch using the steps on Install PyTorch for ROCm — Use ROCm on Radeon GPUs specifically:

  • pytorch_triton_rocm-3.1.0+rocm6.3.2.b253a53766-cp310-cp310-linux_x86_64.whl
  • torch-2.5.1+rocm6.3.2-cp310-cp310-linux_x86_64.whl

Info

Framework 16"

$ uname -a
Linux gianpa-frm16 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04.2 LTS"

Newer versions of the Ubuntu kernel didn’t load the login screen.

rocminfo

ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   5743
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            16
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Memory Properties:
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    63580336(0x3ca28b0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    63580336(0x3ca28b0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    63580336(0x3ca28b0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 4
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    63580336(0x3ca28b0) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1100
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon™ RX 7700S
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      32(0x20) KB
    L2:                      2048(0x800) KB
  Chip ID:                 29824(0x7480)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2208
  BDFID:                   768
  Internal Node ID:        1
  Compute Unit:            32
  SIMDs per CU:            2
  Shader Engines:          2
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Memory Properties:
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 550
  SDMA engine uCode::      16
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    8372224(0x7fc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Recommended Granule:0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1100
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

rocm-smi


============================================= ROCm System Management Interface =============================================
======================================================= Concise Info =======================================================
Device  Node  IDs              Temp    Power    Partitions          SCLK    MCLK     Fan    Perf  PwrCap       VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)
============================================================================================================================
0       1     0x7480,   19047  33.0°C  25.0W    N/A, N/A, 0         808Mhz  456Mhz   29.8%  auto  100.0W       0%     0%
1       2     0x15bf,   11294  36.0°C  27.178W  N/A, N/A, 0         None    2800Mhz  0%     auto  Unsupported  17%    0%
============================================================================================================================
=================================================== End of ROCm SMI Log ====================================================

Thanks a lot :pray:

Hey GF,
I had to re-install Pytorch on my FW16 with Arch Linux (EndeavourOS) last week from scratch and something has changed with the 6.2 or 6.3 version and the AMD guide didn’t work for me either.
In the end I only managed to get it to work with the Pytorch command here, after choosing the “Preview (Nightly)” build.
Maybe you can try that out.
Spiros