Framework Desktop General Stability

Dear Framework-Team,

I’m just some mouseclicks away from ordering a framework desktop as my “24/7 homeserver” (running linux + local LLMs) on it. But reading the posts in the forum makes me somewhat unsettled, whether the 3,xK solution is mature enough, to fulfill my needs. I see AMD driver issues, SMU freezes and many unsuccessful attempts to solve the issues (with great community effort but maybe not enough support from AMD-side?). So my question is, how confident are the framework desktop users/creators, that the issues will be finally solved and the great hardware can be used as daily driver in a productive environment? Are there (maybe still suboptimal) workarounds which can at least fully stabilize the situation?

Thank you very much in advance for your recommendations

I’ve been running a Desktop as a Proxmox server for about a month now with no issues. I use a JetKVM instead of an actual screen but I do use virgl with one of my VM’s and it works fine and much faster than SW emulation. I do not run LLM’s though so don’t have any experience with whatever issues might crop up with that.

Thank you very much for your response!

Are there any other experiences regarding 24/7 LLM inference use on the 128 GB version without any freeze/hang up for weeks? If so, what configuration (distro/driver) are you using?

Tom:

I find myself regretting my Framework purchase. Running Ubuntu 26.04 (newly upgraded from 24.04) and have the 128GB version. I bought it for LLMs and I can’t do anything with it other than develop. It’s an expensive replacement for my traditional AMD arch machine.

Play a video, it freezes. Look at it funny, it reboots. Fabric floods (0x08000800) and AMD GPU issues plague it. And the issues increase with memory usage so LLMs have been out of the question.

I ALSO updated recently to 3.0.5 for the desktop in hopes this resolved the issue and while it DID fix the long boots of 3.0.4 I am rolling back to 3.0.3 because it seems to make the GPU issue worse, not better.

My two cents. The reports you’re seeing are real and they seem to affect Linux most. Was hopeful the upgrade to kernel 7 would address a lot of it and it may. But it’s still unstable.

Hey Andrew, thank you very much for your open words. You are excatly describing the scenario which I do not want to end in. Im a nerd and of course willing to invest time to improve and optimize, but having frequent freezes (partially without any suspected root cause) would be an absolute nightmare. For Ubuntu 24.04 I would not have expected a stable behaviour, because I could not even get my current system booting up (reason was the GeForce 5070TI) and had to migrate to Ubuntu 25.10 which then worked well (I expect many drivers not being up to date). With 26.04 I have no experience yet, but from the “gut feeling” I would go with Bazzite because of the “official framework support” and the “latest graphic drivers”. Did you try bazzite? Regards, Tom

FWIW mine was pretty stable until this week. What I experience now is the video freezes for a minute or so and then the video output is lost. I am forced to reboot. I’m on Fedora. Some googling indicated it’s related to some kind of standby or suspend feature which is confusing because I’m actively using it (watching YouTube and playing Diablo 2). I’ve tried tweaking startup parameters but so far nothing has solved it.

Edit: someone else here has a better fix but I found using an older kernel resolved the issue for now. Anecdotally it’s stable for my use case and no longer freezes.

Work requires Debian, @Tom5 , but it shouldn’t matter. The issue I have is we, the users/consumers, are the guinea pig for a set of problems that shouldn’t be in our court. The technical users are being asked to do the tech for a machine we paid a lot of money for. Heaven forbid you’re not technical because you’re pretty much doomed.

So I’ll just reiterate that I wouldn’t recommend it right now for LLM loads. And firmware fixes are likely not coming because it took us 4 months to fix a boot time issue that shouldn’t have made it out of testing.

And it seems like you’re talking about an NVidia issue given the “latest graphic driver” and “Geforce 5070Ti” notes. All of this is on AMD GPU and Strix Halo. And all of it just falls flat. All of it.

Which I get. Here’s the thing though…Ubuntu isn’t Debian.

It can use the same packages, but that is it. If you can I’d try straight Debian, particularly in a work-sensitive case where you want maniacal stability and uptime. My first gut instinct says ‘Ubuntu is doing weird Ubuntu things’, because weirdness™(C)(R) with Ubuntu happens a lot. And it is why many people who no longer use Ubuntu, no longer use Ubuntu.

AMD, unfortunately, has a long history of teething problems with AGESA. I’m not sure there’s a chipset or motherboard out there that when new didn’t have similar teething problems. And in Framework’s case it happened at the worst possible time (right before and then overlapping) US and Chinese holiday season. It doesn’t excuse not stomping GRUB long-boot before release–but that I’m guessing is much of why it took so long.

I misspoke in my frustration. Work requires Ubuntu which is Debian based. Specifically, the client for which we are doing a substantial amount of work this machine was to proof required Ubuntu and as you aptly put, Ubuntu will do Ubuntu things.

That’s said, in all of these threads on the forums, it’s not Ubuntu, Debian, Fedora, Slackware… It’s Linux kernel and AMD. Plenty of threads reporting these problems on other distros. And I have seen nothing official to say these are the workarounds to bandaid it or that they’re even being worked on, let alone acknowledging the issue. 4 months for “memory fixes” and a boot problem which we both agree was a) obvious and b) shouldn’t have ever been released is unacceptable. And it’s a bad omen of what’s to come.

1 Like

It’s true that AMD has a history of neglecting is drivers until after the replacement has already shipped. It took more than a year for AMD to get around to supporting AI on my “AI” 300.

That said, my FW Desktop running Bazzite has been fine. I haven’t run LLMs on it for weeks solid, but I’ve run them for days, no issues. Other than my frustration over AMD neglecting ROCm and NPU on Linux, and always banging into the Desktop’s paltry 256GiB memory bandwidth, I can’t complain.

Maybe that’s because I don’t often put mine to sleep? It draws so little power when idle that I just keep it powered on and listening.

Thanks a lot to all of you for the exchange. This gives me a good overall picture . And it’s interesting to see how different the devices are behaving - which on the other hand generates the impression, that some “hardware topics” could be part of the equation (voltages, thermals or other things)… But in general I don’t want to put the device to sleep but to let it run 24/7 - means I would not need any “suspend to RAM” states.

In the last days I spent some time to understand the background of the “instability” a little better. I see at least two paths to trigger the freeze:

  1. Hardware Video acceleration in Youtube + Chrome and some other browsers leading to an SMU deadlock.

  2. Heavy CPU load with high memory usage in LLM inference.

Depending on Kernel/Firmware/Driver for some of the issues there are valid workaraounds, but for others we need AMD to stabilize the situation. From what I captured at the moment, AMD is working on topic 1) but I doubt, that it will solve 2) in addition.

@bron to not waste too much time playing around with kernels, firmwares and distros: What combination of Distro (Bazzite-Version), Kernel-Version, Firmware-Version, Inference-Environment (LMStudio, llama.cpp, vLLM) and LLM-Driver (Vulkan+RADV or ROCm) are you using? What do you mean by “after the replacement has shipped”?

Thank you very much,

Tom

  • Bazzite stable
  • 6.17.7-ba24
  • BIOS 3.02
  • Ramallama, LMStudio
  • I usually use Vulkan because it’s feels more stable and isn’t appreciably slower. I haven’t experimented much.

What do you mean by “after the replacement has shipped”?

I mean that AMD’s driver support appears to lag a full generation behind. They didn’t support AI features my 7040 mainboard until well after the AI 300 was announced, close when it shipped. They didn’t support the AI 300 until after the Strix Halo shipped. It’s completely unreasonable to ship hardware that doesn’t receive software support for 1 to 2 years.

Just thought of another thing that might help the stability I’m seeing: I’m usually running around 30GiB models on this 128GB machine. I’ve run 100ishGiB models, but I find them too slow to be useful. 27B parameters seems to hit my sweet spot of functionality and speed.

it appears I could have bought the 64GB machine instead of the 128…

All:

Mario posted a fix for a lot of my issues. You can see it in the AMD GitLab tracker at Making sure you're not a bot!.

You can build a custom kernel right now that should mask the issue by disabling idle power changes: Making sure you're not a bot!

I personally did use cwsr_enable=0 and I’ve been up and running for a long while 3 days. And that beats the freezes every few hours. Mario said, and I believe, that this probably shouldn’t fix it but, for me, it has introduced a small level of stability on kernel 7 and Ubuntu 26.04

Also note, this definitely appears to NOT be an issue with Framework per-se. It’s an AMD driver issue almost entirely. Hopefully it gets fixed.

Just a heads up the suggested fix will only solve one out of 3 triggers of the bug. Or at least the current assumption is that it’s always the same bug.
The summary of @Tom5 was quite good except that for the second bug it’s not CPU load but GPU load instead. Or it could be related to the unified memory and some memory handling component. Unfortunately, it’s currently not known what exactly causes the issue (or at least I was not able to find it). The linked freedesktop issue is the best source for up to date information. Overall, I’d say the board is nice and it would work well if the issue is fixed. I have the same plans as @Tom5 to use the board as NAS/VM/Container host with some LLM running on it as well. I got the 128GB version, so there is plenty of space for a ~30B model and some other workload.

Great exchange guys and thanks for the links… :+1: … Marios statement makes me a little nervous tbo… :wink: “Just as an update to this issue - the symptom of the hang is well understood but the root cause isn’t.” But it’s great so see, that AMD engineers are close to the community (now). I had so many issues with AMD graphics products (driver-wise) over the years, that I had to switch back the NVidia… But now the time has come to give it another chance. :wink:

Update from my side: My framework desktop arrived at the beginning of this week. I have no 24/7 uptime yet, but I’m testing it heavily and haven’t had any serious issues yet. What is my current setup?

  • Plain Framework Mainboard (with no PCI-card yet - will need a SATA enhancement in the future, no WIFI card)
  • Assembled in a (non-framework) ITX case, with more volume, meshed walls, better airflow and stronger PSU, to avoid any issues with a thermal or voltage realted backgroud

Regarding software stack I got lost in a rabbit hole during my research upfront, but then luckily found this great page from @kyuz0 : GitHub - kyuz0/amd-strix-halo-toolboxes · GitHub which saved me a lot of time. @kyuz0 your work is highly appreciated!! :+1:I just followed his recommendations and installed Fedora 43 + downgraded the kernel to version 6.18.5-200.fc43.x86_64 and set the recommended kernel parameters. With intense vibe coding and vane research I could not provoke any close-to-hardware-issue yet. In addition I implemented the amdgpu.no_vpe_idle_pg=1 parameter, but it seems that it even got “lost” during some of my tests. (I do not use the realted trigger path very often, so I will try without for now.)

Until now I’m very pleased… The machine is doing great (and extremely efficient btw). With all the workarounds implemented I see 12 Watts in idle (it was 11.something at the beginning) and 140-150 Watts during token generation.

The memory bandwith is good, but of course could be better. :wink: I had to realize, that Qwen 3.6 27B use can be very painful, due to 11,x T/s - decreasing further with context length. (Unfortunately DFLASH-draft was not working for me - seems to be dedicated to CUDA.) So now I’m working with Qwen3-Coder-Next and Qwen 3.6 35B which really impress me. QCN at some point (with AI opponent capabilities in my test-vibed chess game) looped, but when I reset the context and let Q3.6 run, it took two iterations - 1 for analysis and 1 for correction - and everything was fine again (maybe a context reset would have done the same for QCN). So in general I’m really impressed, what is possible with local AI in the meantime. :+1:

And of course I hope that AMD can mitigate all the MES + Driver issues, that I can migrate to Kernel 7.x soon… :wink:

Btw. I’m still on framework firmware 3.3 … is there any reason to think about an update? Seems that I have “old” RAM on my board and I see more issue reports for 3.4 and 3.5…

I’ve got FW desktop 128G in October and was using it as my main machine for code dev. Very solid, no glitches, reboots, etc. The only exception is ROCm, if I use llama.cpp with ROCm I eventually get gpu hangs / crashes. Works fine with vulkan backend though, at about same perf – I’ve run LLMs overnight without problems.