Thought about a new thread but figure this is relevant here …has anyone successfully gotten llama.cpp or similar to work with the Framework 13 in linux?
- Using a AI 5 340 board in DIY form with Fedora 42
- I’ve followed a number of guides, and llama.cpp builds but llama.cpp consistently gives the same errors and will not load models:
-
load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-rpc.so load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-vulkan.so load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-cpu.so - the compilation doesn’t error and trying to build with -DGGML_BACKEND_DL=ON does cause errors so I have not tested that
- no public help has actually pointed to a method to fix this
- I know this seems like a llama.cpp specific git issue, but it is heavily tied to the hardware configuration so asking the community.
-
I’ve followed the desktop guides from @lhl and @Lars_Urban with contributions by @kyuz0 (though I’d rather not use an arbitrary container and the toolbox tool was not found in the repo anyway).
https://community.frame.work/t/amd-strix-halo-llama-cpp-installation-guide-for-fedora-42/75856
Trying to use vulcan not ROCm. I think I’ve done all the things:
- setup the environment and build
- check devices
- try to load a model
See code below
# root user
dnf install gcc.x86_64 # says installed 15.2.1-1
dnf install gcc-c++ # says installed 15.2.1-1
dnf install libstdc++ # says installed 15.2.1-1
dnf install python3-devel
dnf install python3-pip
dnf install mesa-vulkan-drivers.x86_64
dnf install vulkan-tools.x86_64
dnf install vulkan-headers.noarch
dnf install vulkan-loader-devel
dnf install curl.x86_64 # but already there
dnf install curlpp.x86_64 # A C++ wrapper for libcURL
dnf install curlpp-devel # Development files for curlpp
dnf install libcurl-devel # says already installed
dnf install glslc.x86_64
dnf install rocm.noarch
dnf install cmake
sudo grubby --update-kernel=ALL --args='amd_iommu=off amdgpu.gttsize=98304 ttm.pages_limit=25165824' # for 96 GB
# regular user
cd ~/opt/llama.cpp-vulkan/
git pull
cmake -B build -DGGML_VULKAN=ON && cmake --build build --config Release -j 11
~/opt/llama.cpp-vulkan/build/bin/llama-cli --list-devices
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon 840M Graphics (RADV GFX1152) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
register_backend: registered backend Vulkan (1 devices)
register_device: registered device Vulkan0 (AMD Radeon 840M Graphics (RADV GFX1152))
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (AMD Ryzen AI 5 340 w/ Radeon 840M)
load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-rpc.so
load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-vulkan.so
load_backend: failed to find ggml_backend_init in ~/opt/llama.cpp-vulkan/build/bin/libggml-cpu.so
Available devices:
Vulkan0: AMD Radeon 840M Graphics (RADV GFX1152) (65877 MiB, 65707 MiB free)
~/opt/llama.cpp-vulkan/build/bin/llama-cli -m /home/<username>/opt/llm_models/models/mistral_models/7B-Instruct-v0.3/model.q8_0.gguf -ngl 99
bunch of output that includes the same 3 errors above and a failure to load model message
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor 'token_embd.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '~/opt/llm_models/models/mistral_models/7B-Instruct-v0.3/model.q8_0.gguf', try reducing --n-gpu-layers if you're running out of VRAM
main: error: unable to load model