How's your real-world AI (code generation) on the Framework Desktop?

Super post, thanks Jeff :ok_hand:

Very interesting. I saw some online commentary that ~80GB VRAM / ~48GB system was worth trying, on the basis that holding a larger model is important, but it still needs to be “fed” by the main system (and so one cannot reduce the system down to a negligible level in order to max out the VRAM). Did you try any other ratios?

Ooh yes please!

1 Like

If your goal is code generation and software engineering productivity, local models on the Framework Desktop are definitely usable, especially with Strix Halo’s large unified memory. The main advantage is being able to run larger models (30B–70B class quantized models) that wouldn’t fit on many discrete GPUs with limited VRAM.

That said, for pure coding performance, Claude and GPT-class cloud models are still significantly ahead of what most local models can deliver today. A £1,500 GPU isn’t strictly required to run local models, but it will generally provide faster inference than the integrated Radeon 8060S.

My recommendation would be: use the employer-funded Claude or Copilot for day-to-day work, and experiment with local models if privacy, control, or learning is important to you. Local AI is already practical for coding assistance, but it’s not yet a full replacement for the top hosted models.

1 Like

My recommendation would be: use the employer-funded Claude or Copilot for day-to-day work, and experiment with local models if privacy, control, or learning is important to you. Local AI is already practical for coding assistance, but it’s not yet a full replacement for the top hosted models.

Yes, good thoughts: I will do that in the short term. My brain has decided to have pressing ethical reasons to move away from cloud AI, so I could not tarry forever; I believe I would accept a worse reasoning or speed for a perceptible ethical improvement. (How much worse, of course, is the big question - it needs still to be useful).

Part of my ethical view is the withdrawal of funding: it’s not merely that I don’t want to fund this or that thing, but I also don’t want my employer to do so on my behalf. My ethical stance demands that I remove the funding from my seat, even if someone will take the burden off me.

In the short term, I acknowledge that Anthropic are not too bad, so I can hang on for ~six months if necessary. I think I will order the top-spec FW Desktop in a month or so, when I shall be around to receive the parcel, but of course I will keep researching. There are spotty reports of hardware QC issues, so I will dig into those.

As a broad aside, I can see two different ways in which AI code gen can work:

  1. The addition of a feature is tackled, commit by commit, as one would code manually. Today I did a piece of work in three hours that I think would have taken me over a day previously. A demo bit of data in a backend web controller, a placeholder SVG here, some routing there, some CSS reformatting, some database lookups, all on an iterative basis. That’s ~40 commits, but without much of the old cognitive overhead (e.g. the finer points of the grid layout system in CSS).
  2. A very complex and comprehensive feature requirement, plus AGENTS.md files, to shape how the feature should be made in a one-shot attempt. This is the sort of AI that takes a few hours to run. I am not here yet, and maybe I won’t ever be.

I like the first option because each commit is quick - 100 to 6K tokens apiece. There is minimal waiting, and there is no so much waiting that I lose focus. The second option is interesting, but there are no intermediate results, and if it gets something wrong, one might have to spend another few hours on another attempt.

My hope is that option one requires less reasoning, and thus it is within the bounds of the locally runnable models; not as good as the frontier models, I warrant, but perfectly good enough for my workflow. Plus, as a bonus, I still feel that the direction of development for an artifact is still under my control.

I never tried 80/48GB split. The Bios doesn’t directly support that split, so I think you would need the BIOS in auto mode and then allocate in linux somehow (haven’t researched it). I know people often use the Auto mode allocation in Bios, but somewhere I read that this can slow the initial loading of models.

80/48 could be useful for loading a larger say ~40GB model, and then have plenty of overhead in VRAM for a large KV cache and context length.

I’m cleaning up my system documentation. It will document the whole process for basic install, tuning and installing the useful WebUI and TUI and docker services I’m using. Includes Caddy for reverse proxy, llama.cpp, llama-swap, HF Downloader, and anything else I find useful in my exploration.

I’ll be posting it to a github public repo soon.

-Jeff

1 Like

not recommanded on linux:

(and ttm config is not needed on latest llama.cpp …)

Your answer raises a basic question I’ve never seen answered definitely: How do you get ROCm on the Strix Halo 395+ installed? Your link contains a compatibility matrix for ROCm, unless it’s out of date… this iGPU is not support: Radeon 8060s.

I have yet to find anyone who can tell me how to get ROCm installed on this platform without resorting to experimental 3rd-party drivers and configuration.

And “not recommend on linux” – where does this recommendation come from? The same non-supported ROCm link you provided?

I’d love to get ROCm installed and working to try it vs Vulkan; but I have yet to find any successful information on setting this up on the framework desktop.

If you have a link, I’d appreciate if you posted it.

-Jeff

Have you tried this? It looks like it uses ROCm:

it is, but in the Ryzen APU pages:

I know it’s a bit counterintuitive (and I’ve been caught out several times), but APUs are categorized as Ryzen <..> not Radeon (which are for dGPUs).

So the page I linked to contains AMD’s recommendations for the desktop framework.

However, it’s for Ubuntu; AMD doesn’t support Fedora.

For Fedora, there are two options:

Djip, thanks for pointing this out. I will look into it. I’m not an AI/LLM researcher, so if ROCm gives me slightly better t/s rates than Vulkan, I have to balance the effort to get it running vs performance improvements. Vulkan so far has been quite good for my needs and stable with pretty much all the models I’ve played with so far. But I appreciate the updated information. I also note that it’s only been validated on Ubuntu using PyTorch with FP16 models. So I’m curious if it can be integrated and used by llama.cpp and support other quantizations. And specifically, that documentation says:

Lower than expected performance may be observed while running some LLM workloads (such as Llama 31B/3B) on AMD Ryzen™ AI MAX+395 processors.

I have come across the hipEngine project before, I have not tried to set it up. To be clear, I’m hesitant to deep dive into a 3rd-party implementation. I’m willing to be on the edge of the development for LLM, but only when it’s stable. If anyone can share experience with hipEngine, I’m interested to see what your results have been, but I’ve got limited time to fiddle with alpha-level versions of software.

But thanks and keep the information flowing, I’m always looking.

-Jeff

TheRock looks interesting, but I don’t have a real need for PyTorch use at this time. I do need some TensorFlow backend in the near future, but I haven’t even started doing the research for that project.

Thanks again, all this information is useful.

-Jeff

@Jeffrey_Bakke : this is a good place look at

The most complicated thing is that things have been moving very quickly lately, both on the Fedora and AMD/ROCM sides…

  • Fedora recently changed the way boot parameters are modified: it’s now done using GRUBBY:
sudo grubby --update-kernel=ALL --args="..."
# ...
  • but the ttm size can even be change by systemd …
# /etc/modprobe.d/ttm.conf
options ttm pages_limit=25165824

Next, the amdgpu.gttsize is no more needed it is the old amd gtt config kernel parameter, but now amd driver use the “new” standart ttm parameters…

And… well this may change next week :wink:

So that’s why I try to correct the “old practices” I see so that they eventually disappear. :smiling_face_with_sunglasses:

And don’t worry, I know it’s difficult to find your way through all this information…

Thanks for the link, lots of good information there.

-Jeff

Hey Jeffrey,

Good news — ROCm does support Strix Halo (gfx1151), starting from ROCm 7.0.2. Setup:

1. Install ROCm 7.2 via your distro’s native package manager.

2. Set `HSA_OVERRIDE_GFX_VERSION=11.5.1` and `HSA_ENABLE_SDMA=0`.

3. Verify with `rocminfo | grep gfx1151`.

For Ollama, add both as `Environment=` lines in the systemd unit file.

The [hogeheer499-commits/strix-halo-guide]( GitHub - hogeheer499-commits/strix-halo-guide: AMD Strix Halo local LLM guide: direct 100.0 t/s 30B Qwen MoE on Ryzen AI MAX+ 395 / Radeon 8060S. Setup, benchmarks, raw evidence. · GitHub ) is actively maintained and covers the full workflow including BIOS settings, kernel params, and experimental PyTorch builds.