Installing ROCm / HIPLIB on Ubuntu 22.04

Arav_Agarwal · May 6, 2024, 9:06pm

Hey folks - I’m loving my FrameWork so far, but wanted to get an idea of if anyone has any good tutorials for getting ROCm and HIP installation done on the Framework 16 for Ubuntu 22.04.

I’m currently running into issues with the amdgpu-install tool that AMD provides - it’s not installing the amdgpu-dkms package. I am sure I can resolve this with enough time ( I have a rough idea of what’s going on there ), but what I want to know is:

Does anyone have an installation guide for CuPy / PyTorch that they have successfully used here? I am a bit out of my depth when it comes to understanding driver installations, and any reference ( even if it is literally a manual to RTFM ) would help.
Does anyone know how to do this without disabling external monitors? When I ran amdgpu-install with the usecases I wanted, it disabled my external monitor support. I imagine this is because I didn’t specify the expected usecases, but I’d love to know if there’s a good way to make sure that this works.

Thank you all so much in advance!

cepth · May 7, 2024, 7:24am

I’ve been able to get PyTorch + Tensorflow (as well as TinyGrad) working on the Framework 16.

Assuming you have the discrete GPU module, I would do the following:

Use amdgpu-install. Personally, I installed a number of the additional packages, so my install command looked like: sudo amdgpu-install --usecase=graphics,rocm,hip,mllib. This will require a reboot after installation. I believe including graphics as a usecase should preserve external monitor support (mine work fine).
Verify that you can run rocm-smi (which is the AMD equivalent to nvidia-smi).
Optional (but recommended), install amdgpu_top. This will require that have you Rust installed, but it gives you a very nice interface, and far more detailed information than the vanilla rocm-smi.
I would recommend a good environment manager. Personally, I use mambaforge/miniforge. The solver (for dependencies) is way faster than vanilla Conda.
Create an environment (whether through mambaforge or something simpler, like venv), and activate it.
Install PyTorch from this official ROCm repo. I still have ROCm 6.0.2 installed, but I assume if you download and run amdgpu-install today, it’ll likely use 6.1. Be sure to choose the .whl (Python wheel) from the correct folder, corresponding to the PyTorch edition you’re looking for.
I’ve found that Ubuntu doesn’t always use the discrete GPU. And, officially ROCm does not support the RX 7700S. So, I’ve found that it’s helpful to add the following environment variables to your .bashrc (or you can export these one time):

export HSA_OVERRIDE_GFX_VERSION=11.0.0
export HCC_AMDGPU_TARGET=gfx1100
export PYTORCH_ROCM_ARCH=gfx1100
export TRITON_USE_ROCM=ON

export ROCM_PATH=/opt/rocm-6.0.2
export ROCR_VISIBLE_DEVICES=0
export HIP_VISIBLE_DEVICES=0
export USE_CUDA=0

What do these mean?

If you have a newer version of ROCm, change the ROCM_PATH line to reflect the correct path.
The visible devices lines are so that PyTorch (and other frameworks) use the 7700S, and not the iGPU.
gfx1100 is the architecture for the RX 7900XTX, which is to date the only consumer card that ROCm officially supports. The 7700S has a “real” architecture designation of gfx1102, and if you don’t modify that environment variable, ROCm PyTorch will error out.

Clone this official ROCm benchmarking repo, and run the benchmarks.

Be sure that your Python environment with ROCm PyTorch is active in your terminal.

For example, with my power profile set to “performance” and my laptop plugged in, I run (within that benchmarking repo folder):

python3 micro_benchmarking_pytorch.py --network resnext101 --batch-size 32 --iterations 400 --fp16 1 --compile

(The compile flag activates PyTorch’s pre-compile function, which requires overhead/setup time, but leads to faster runs).

I get this terminal output:

INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : resnext101
Num devices: 1
Dtype: FP16
Mini batch size [img] : 32
Time per mini-batch : 0.365741006731987
Throughput [img/sec] : 87.49360725484475

This should get you up and running with PyTorch. If, for whatever reason, you want to get TensorFlow working, I can dig up my notes. But, be warned that it seems ROCm support for TensorFlow is in a much worse state than PyTorch. For example, you have to install the nightly release. The mainline TensorFlow release will completely fail to run on the 7700S.

Addendum:

I should credit this blog post for some ideas that helped me get my setup working. Note that their setup was smoother/simpler because they’re using an officially supported GPU (the desktop 7900 XTX), and they’re running a much older version of ROCm (5.5).

Arav_Agarwal · May 8, 2024, 12:16pm

Thank you for the detailed information!

In case anyone else finds this - here are some notes I had when going through it. YMMV, and I’m going to edit as I run through it all:

In terms of use-cases, I’m using pretty much all of them, as I also want to get CuPY working for Spacy and figured “why not”. If I find that I will regret this ( likely ) I will put additional information here.
I had to specify 6.0.2 for installation to work with Ubuntu 22.04 and the associated version of the amdgpu-install tool that AMD provides.

EvanG · April 2, 2025, 9:47pm

A few notes for anyone else trying to do this:

You can get amdgpu-install from https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html
I had to add DISABLE_ADDMM_CUDA_LT=1 to my environment variables. I found this fix in this github issue from October, so it looks like a more recent problem than this thread.
float16 is not functional on my machine (for ROCm purposes). I have no idea why, I don’t see anything about it online but it is what it is. I don’t have the error message in front of me but if I remember correctly it was something along the lines of “invalid operation” or “invalid instruction” and some hex number. If anyone knows anything about why this is happening I’d love to know.

cepth · April 20, 2025, 8:42pm

An April 2025 update for everyone:

ROCm 6.4.0 works on Ubuntu 24.04. There was a period where the 6.1 and 6.2 releases were not working (without significant modifications to package sources, essentially adding the 22.04 lists to 24.04). Running PyTorch 2.5.0 and 2.6.0 with ROCm 6.3 would lead to unresolvable HIP runtime errors, which was extremely frustrating.
Thankfully, PyTorch 2.7 and 2.8 (still in “nightly”/beta status as of writing) now have native gfx1102 support, meaning no more need to hack environment variables. However, given those versions of PyTorch haven’t been generally released yet, it’s unlikely that projects/programs that depend on PyTorch have been updated to use those versions.
I’m seeing parity between my older Ubuntu 22.04+ROCm 6.1.2+PyTorch2.5.0 setup with Ubuntu 24.04+ROCm 6.4.0+PyTorch 2.8.0 on most PyTorch workloads, however there are significant performance regressions when running several PyTorch test scripts. E.g. VGG16 (an ancient network at this point) sees a low single-digit percentage point increase in performance (in terms of throughput), but running an EfficientNet B0 sees an almost 40% drop from the ROCm 6.1.2 setup. Hopefully, with the stated goal of there being eventual ROCm-level support for all newish AMD GPUs, these regressions will be fixed.

Spirosbond · May 19, 2025, 9:21pm

To my pleasant surprise I confirm what you describe!!
On my Arch System (EndeavourOS) I am… rocking… ROCm6.4 with the latest Pytorch Nightly (from here) and without hacking the HSA_OVERRIDE_GFX_VERSION and all looks good so far with limited testing!

Here are some scripts I have been using to test my installation in case you find them useful:

# test_rocm.py
import torch, grp, pwd, os, subprocess

devices = []
try:
    print("\n\nChecking ROCM support...")
    result = subprocess.run(["rocminfo"], stdout=subprocess.PIPE)
    cmd_str = result.stdout.decode("utf-8")
    cmd_split = cmd_str.split("Agent ")
    for part in cmd_split:
        item_single = part[0:1]
        item_double = part[0:2]
        if item_single.isnumeric() or item_double.isnumeric():
            new_split = cmd_str.split("Agent " + item_double)
            device = (
                new_split[1]
                .split("Marketing Name:")[0]
                .replace("  Name:                    ", "")
                .replace("\n", "")
                .replace("                  ", "")
                .split("Uuid:")[0]
                .split("*******")[1]
            )
            devices.append(device)
    if len(devices) > 0:
        print("GOOD: ROCM devices found: ", len(devices))
    else:
        print("BAD: No ROCM devices found.")

    print("Checking PyTorch...")
    x = torch.rand(5, 3)
    has_torch = False
    len_x = len(x)
    if len_x == 5:
        has_torch = True
        for i in x:
            if len(i) == 3:
                has_torch = True
            else:
                has_torch = False
    if has_torch:
        print("GOOD: PyTorch is working fine.")
    else:
        print("BAD: PyTorch is NOT working.")

    print("Checking user groups...")
    user = os.getlogin()
    groups = [g.gr_name for g in grp.getgrall() if user in g.gr_mem]
    gid = pwd.getpwnam(user).pw_gid
    groups.append(grp.getgrgid(gid).gr_name)
    if "render" in groups and "video" in groups:
        print("GOOD: The user", user, "is in RENDER and VIDEO groups.")
    else:
        print(
            "BAD: The user",
            user,
            "is NOT in RENDER and VIDEO groups. This is necessary in order to PyTorch use HIP resources",
        )

    if torch.cuda.is_available():
        print("GOOD: PyTorch ROCM support found.")
        t = torch.tensor([5, 5, 5], dtype=torch.int64, device="cuda")
        print("Testing PyTorch ROCM support...")
        if str(t) == "tensor([5, 5, 5], device='cuda:0')":
            print("Everything fine! You can run PyTorch code inside of: ")
            for idx, device in enumerate(devices):
                print(f"---> {device}")
    else:
        print("BAD: PyTorch ROCM support NOT found.")
except:
    print(
        "Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed."
    )

and

# script.py
import random
from time import time
from os import putenv

# putenv("HSA_OVERRIDE_GFX_VERSION", "11.0.0")
# putenv("PYTORCH_ROCM_ARCH", "gfx1100")
# putenv("HCC_AMDGPU_TARGET", "gfx1100")
# putenv("AMD_SERIALIZE_KERNEL", "3")
# putenv("HIP_VISIBLE_DEVICES", "0")
# putenv("ROCM_PATH", "/opt/rocm")
import torch


class SmallModel(torch.nn.Module):
    def __init__(self, in_f) -> None:
        super().__init__()
        self.cnn = torch.nn.Sequential(
            torch.nn.Linear(in_f, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 10000),
            torch.nn.ReLU(),
            torch.nn.Linear(10000, 100),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # print(x.device)
        return self.cnn(x)


device = "cuda"
# device = "cpu"

a = torch.randn((100, 100))
a = a.to(device)
m = SmallModel(100)
m = m.to(device)
# Create start and end events
start_event = torch.cuda.Event(enable_timing=True)
end_event = torch.cuda.Event(enable_timing=True)

# Record start event
start_event.record()

# GPU operations you want to time
for _ in range(10):
    b = m(a)

# Record end event
end_event.record()

# Wait for all ops to finish
torch.cuda.synchronize()

# Compute elapsed time in milliseconds
elapsed_time_ms = start_event.elapsed_time(end_event)

print(f"Total time: {elapsed_time_ms:.2f} ms with device {device}")

Topic		Replies	Views
AMD ROCm for local training and inferencing Framework Laptop 16 framework-laptop-16-amd-7040	2	1253	September 29, 2024
Fedora amdgpu-install Framework Laptop 16 framework-laptop-16-amd-7040	6	1420	March 13, 2024
Stable Diffusion / ROCm / PyTorch Setup Linux ubuntu	17	6004	February 14, 2025
Experiments with using ROCM on the FW16 AMD Linux ubuntu	17	1376	April 25, 2025
ROCm on the new Framework 13 Linux bazzite	15	850	June 8, 2025

Installing ROCm / HIPLIB on Ubuntu 22.04

Related topics