Oss-gpt 120b large context stalls during llama.cpp checkpoints

Hey all,

I am using Ubuntu Server 25.10.04.

I am using distrobox(toolbox) and I am using rocm6.4.4 which has amazing performance compared to all other backends for me.

I am constantly refreshing to get the latest version as well.

Everything works great until I use a larger context, think 60k or above. Yes, I have available ram, and I have updated the system to be able to utilize full system memory. The exact same prompt works on vulkan backends (although 2x slower) so it’s not the system or my OS. Anyone else running large context with rocm? Unfortunately I have tried for days to fix this and if I cannot I am going to have to return the Desktop which is upsetting. Thanks for reading this.

I haven’t had any issues with llama.cpp compiled from source with the latest ROCm nightly from TheRock. I don’t use toolboxes, just compile and run on the host. Just ran llama-bench with 64K depth (so it populates context to 64K tokens and it worked without issues). Performance wise it works a bit better than that 6.4.4 toolbox that you are using and the performance doesn’t degrade as much as context grows (note that rocWMMA currently causes very quick performance degradation on large contexts, so use HIP only compile).

Please let me know if you need any help compiling - I can post instructions (just need to organize my notes first).

Here is the result of llama-bench:
build/bin/llama-bench -m ~/.cache/llama.cpp/ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4-00001-of-00003.gguf -fa 1 -d 64000 -p 2048 -n 128 -ub 2048 -r 1

model size params backend ngl n_ubatch fa test t/s
gpt-oss 120B MXFP4 MoE 59.02 GiB 116.83 B ROCm 99 2048 1 pp2048 @ d64000 215.90 ± 0.00
gpt-oss 120B MXFP4 MoE 59.02 GiB 116.83 B ROCm 99 2048 1 tg128 @ d64000 28.76 ± 0.00
1 Like

Thank you so much for the response. I tried running llama-bench and it would just hang. :thinking:, maybe it has to do with the version of rocm I am using. I thought the tookbox provided by GitHub - kyuz0/amd-strix-halo-toolboxes was compelling nightly. I’ll do some digging. Out of curiosity what OS are you running? Hoping I won’t have to deal with kernels to use nightly.

You probably need to pick up the LR compute WA. It’s in Ubuntu 24.04 OEM 6.14 kernel and you need upgraded GPU firmware.

No dependency on the kernel, it will just use whatever AMD drivers are installed currently, the driver part is not a part of ROCm build. I use Fedora 43 beta, kernel 6.17.3.

Here are the instructions:

Installed development tools:

sudo dnf install @c-development @development-tools cmake
sudo dnf install libcurl-devel

Install Rocm

Download Rocm tarball for gfx1151 arch from TheRock repository: https://therock-nightly-tarball.s3.amazonaws.com/index.html
Extract to /opt/rocm.

Example:

sudo mkdir -p /opt/rocm
chown eugr /opt/rocm
wget https://therock-nightly-tarball.s3.amazonaws.com/therock-dist-linux-gfx1151-7.10.0a20251017.tar.gz
tar xzf therock-dist-linux-gfx1151-7.10.0a20251017.tar.gz -C /opt/rocm

Set up environment variables:

~/rocm-env.sh:

#!/bin/bash

# Set the root for your ROCm installation
export ROCM_PATH=/opt/rocm

# Add ROCm's main binaries and the compiler toolchain to your PATH
export PATH=$ROCM_PATH/bin:$PATH

# Tell the system's dynamic linker where to find ROCm libraries
export LD_LIBRARY_PATH=$ROCM_PATH/lib:$LD_LIBRARY_PATH

Activate and test:

source ~/rocm-env.sh
amd-smi

Install RocmWMMA (optional)

Clone repository:

git clone https://github.com/ROCm/rocWMMA.git
cd rocWMMA

Configure and build (make sure the Rocm environment variables are set):

CC=/opt/rocm/bin/amdclang CXX=/opt/rocm/bin/amdclang++ cmake -B build . -DROCWMMA_BUILD_TESTS=OFF -DROCWMMA_BUILD_SAMPLES=OFF -DGPU_TARGETS=gfx1151
cmake --build build -- -j16

Install llama.cpp

Clone llama.cpp:

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

Create a build script.

build_llama.cpp.sh:

!#/bin/bash

export HIP_PLATFORM=amd

# Build without rocWMMA - optimal as of 10/18/2025
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"     cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-I/home/eugr/llm/rocWMMA/library/include" -DCMAKE_HIP_FLAGS="-I/home/eugr/llm/rocWMMA/library/include"   && cmake --build build --config Release -- -j 16

# Build with rocWMMA
#HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"     cmake -S . -B build -DGGML_HIP_ROCWMMA_FATTN=ON -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-I/home/eugr/llm/rocWMMA/library/include" -DCMAKE_HIP_FLAGS="-I/home/eugr/llm/rocWMMA/library/include"   && cmake --build build --config Release -- -j 16

Test:

build/bin/llama-cli --list-devices
1 Like

There is definitely a kernel dependency. You just happen to have a new enough kernel to pick up the LR compute WA. It’s also in Ubuntu; but only in Ubuntu 24.04 OEM 6.14 1014 kernel or later. But Ubuntu doesn’t have a new enough Strix Halo microcode to use it yet, that will come by default soon.

Here’s a microcode snapshot with the matching MES sched microcode for now.

https://launchpad.net/~amd-team/+archive/ubuntu/gfx1151/+build/31380238/+files/amdgpu-firmware-dcn351_2025.10.18.git8b4de42e3noble_all.deb

2 Likes

I am on using kernal 6.17.0-5-generic so I assume I will be ok…..

Should work on 6.16.x too

Can you elaborate more on
```
!#/bin/bash export HIP_PLATFORM=amd # Build without rocWMMA - optimal as of 10/18/2025 HIPCXX=“$(hipconfig -l)/clang” HIP_PATH=“$(hipconfig -R)” cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=“-I/home/eugr/llm/rocWMMA/library/include” -DCMAKE_HIP_FLAGS=“-I/home/eugr/llm/rocWMMA/library/include” && cmake –build build --config Release – -j 16
```
I see you have a path that is local which is including
rocWMMA which I thought was not being built for this command.

No. It hasn’t been merged in Canonical generic 6.17. only OEM 6.14.

It’s in 6.17.2 iirc and Canonical 6.17 is 6.17.0.

It’s just a leftover, you can omit it:

HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)"     cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release && cmake --build build --config Release -- -j 16

@Eugr, thank you for your help, I didn’t know this was an alternative. I am getting some build issues regarding HIP. I will have to try later.

If you are getting weird issues like “can’t find device lib” or something like that, set these variables too (adjust for your paths):

export ROCM_PATH=/opt/rocm-7.10.0a20251015
export LD_LIBRARY_PATH=/opt/rocm-7.10.0a20251015/lib
export DEVICE_LIB_PATH=$ROCM_PATH/llvm/amdgcn/bitcode
export HIP_DEVICE_LIB_PATH=$ROCM_PATH/llvm/amdgcn/bitcode

Haha you read my terminal?

That is exactly the issue I am experiencing but changing the env variables isn’t working. I reinstalled rocm to make sure I had the correct permissions. I think I have all the development apps needed.

CMAKE_BUILD_TYPE=Release
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
-- HIP and hipBLAS found
-- Including HIP backend
-- ggml version: 0.9.4
-- ggml commit:  03792ad9
-- Configuring done (0.3s)
-- Generating done (0.1s)
-- Build files have been written to: /home/dev/llama.cpp/build
[  2%] Built target xxhash
[  2%] Built target build_info
[  2%] Built target llama-gemma3-cli
[  2%] Built target llama-llava-cli
[  2%] Built target sha256
[  2%] Built target sha1
[  2%] Built target llama-qwen2vl-cli
[  2%] Built target llama-minicpmv-cli
[  5%] Built target ggml-base
[  9%] Built target ggml-cpu
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/add-id.cu.o
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/arange.cu.o
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/acc.cu.o
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/argsort.cu.o
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/clamp.cu.o
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/binbcast.cu.o
[ 10%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/conv2d-dw.cu.o
[ 11%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/argmax.cu.o
[ 12%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/conv2d.cu.o
[ 12%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/concat.cu.o
[ 14%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/conv2d-transpose.cu.o
[ 14%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/convert.cu.o
[ 14%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/count-equal.cu.o
[ 14%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/conv-transpose-1d.cu.o
[ 14%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/cpy.cu.o
[ 14%] Building HIP object ggml/src/ggml-hip/CMakeFiles/ggml-hip.dir/__/ggml-cuda/cross-entropy-loss.cu.o
clangclangclangclangclang: : : : : clangerror: error: error: : error: error: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime





clang: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clangclang: clang: : clangclangerror: : : error: error: error: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimeclang: 
cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime



error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimeclang
: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clang: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime
clangclangclangclang: : : : error: error: error: error: cannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtimecannot find HIP runtime; provide its path via '--rocm-path', or pass '-nogpuinc' to build without HIP runtime

can you run hipconfig and post the output here? If it’s not in the path, include $ROCM_PATH/bin (if you set it before) in your PATH. It should look like this:

eugr@ai:~/llm/llama.cpp$ hipconfig
HIP version: 7.1.25421-c8f7d7bbb4

==hipconfig
HIP_PATH           :/opt/rocm-7.10.0a20251020
ROCM_PATH          :/opt/rocm
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-7.10.0a20251020/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm/lib/llvm/bin
AMD clang version 20.0.0git (https://github.com/ROCm/llvm-project.git a7d47b26ca0ec0b3e9e4da83825cace5d761f4bc+PATCHED:7a5435441416dc6f50dd93bb4d00d541132e999a)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm-7.10.0a20251020/lib/llvm/bin
sh: line 1: /opt/rocm/lib/llvm/bin/llc: No such file or directory
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 --hip-link

== Environment Variables
PATH =/opt/rocm/bin:/opt/rocm/bin:/home/eugr/.local/bin:/home/eugr/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin
egrep: warning: egrep is obsolescent; using grep -E
HIP_DEVICE_LIB_PATH=/opt/rocm/llvm/amdgcn/bitcode
LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/lib:

== Linux Kernel
Hostname      :
ai
Linux ai 6.17.3-300.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Oct 15 14:19:22 UTC 2025 x86_64 GNU/Linux

Yeah, I am thinking it had to by my env variables right?

dev@az-rizz:~$ hipconfig
HIP version: 7.1.25415-b5abb01163

==hipconfig
HIP_PATH           :/opt/rocm
ROCM_PATH          :/opt/rocm
HIP_COMPILER       :clang
HIP_PLATFORM       :amd
HIP_RUNTIME        :rocclr
CPP_CONFIG         : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm/include -I/include

==hip-clang
HIP_CLANG_PATH     :/opt/rocm/lib/llvm/bin
AMD clang version 20.0.0git (https://github.com/ROCm/llvm-project.git a7d47b26ca0ec0b3e9e4da83825cace5d761f4bc+PATCHED:1d3c56e3e837bfa87144aef73e9faad95492b591)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/lib/llvm/bin
sh: 1: /opt/rocm/lib/llvm/bin/llc: not found
hip-clang-cxxflags :
 -O3
hip-clang-ldflags :
--driver-mode=g++ -O3 -Llib --hip-link

== Environment Variables
PATH =/opt/rocm/bin:/opt/rocm/bin:/opt/rocm/bin:/opt/rocm/lib/llvm/bin:/opt/rocm/bin:/opt/rocm/bin:/opt/rocm/bin:/opt/rocm/lib/llvm/bin:/tmp/_MEIEXEDfK/bin:/tmp/_MEIEXEDfK/bin:/opt/rocm/bin:/home/dev/.local/bin:/tmp/_MEIEXEDfK/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/dev/.local/bin:/home/dev/.local/bin:/home/dev/.local/bin:/home/dev/.local/bin
LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm-7.10.0a20251015/lib
HIP_PATH=/opt/rocm/hip
HIP_PLATFORM=amd
HIP_DEVICE_LIB_PATH=/opt/rocm-7.10.0a20251015/llvm/amdgcn/bitcode

== Linux Kernel
Hostname      :
az-rizz
Linux az-rizz 6.17.0-5-generic #5-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 22 10:00:33 UTC 2025 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 25.10
Release:	25.10
Codename:	questing

Is your /opt/rocm a symlink to /opt/rocm-7.10.0a20251015?

If not, then you need to fix your paths and use /opt/rocm everywhere, e.g. HIP_DEVICE_LIB_PATH=/opt/rocm/llvm/amdgcn/bitcode

Good catch, updated but now all I am seeing is this.

ROCM_PATH: /opt/rocm
CMAKE_C_COMPILER: /opt/rocm/llvm/bin/clang
CMAKE_CXX_COMPILER: /opt/rocm/llvm/bin/clang++
lrwxrwxrwx 1 dev video 8 Oct 17 04:33 /opt/rocm/llvm/bin/clang -> clang-20
lrwxrwxrwx 1 dev video 5 Oct 17 04:33 /opt/rocm/llvm/bin/clang++ -> clang
CMAKE_BUILD_TYPE=Release
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native 
CMake Error at /usr/share/cmake-3.31/Modules/CMakeDetermineHIPCompiler.cmake:73 (message):
  CMAKE_HIP_COMPILER is set to the hipcc wrapper:

   /opt/rocm/bin/hipcc

  This is not supported.  Use Clang directly, or let CMake pick a default.
Call Stack (most recent call first):
  ggml/src/ggml-hip/CMakeLists.txt:38 (enable_language)

Crazy thing is that I refresh the toolbox docker complied version of rocm, not the rock one, and now I am not seeing stalls like I did before so I think there was an underlying bug that was fixed. I am going to have to give this this one driver a rest although I really appreciate your help. I might try it again later. I can def say either I suck or this is a bit advanced.

This kernel is not going to have the LR compute wa.

I don’t know what that means, but thank you. I am taking a break and will try to do more research on the topic.