Framework instability with 0x08000800 fabric flood

Which Linux distro are you using?

Ubuntu

Which release version?
24.04.4 LTS

Which kernel are you using?
6.17.0-20-generic

Which BIOS version are you using?

3.0.3 - 3.0.4 has an AMD bug that causes freezes.

Which Framework Desktop model are you using? (AMD Ryzen™ AI Max 300 Series)
AMD Ryzen AI Max+ 395 w/ 128GB RAM


Continue to get crashes. With 3.0.4 I would get a GPU freeze-up. Downgrading to 3.0.3 worked for a while until in the middle of a long-ish video conference the system froze and restarted.

last -x confirmed the crash. It wasn’t a reboot.

Confirmed reset reason was fabric flood:

sudo dmesg | grep -iE 'whea|machine check|hardware error|corrected error'

[ 2.203806] x86/amd: Previous system reset reason [0x08000800]: an uncorrected caused a data fabric sync flood event

Sniffing around, I see I’m not the only one with this problem.

Absolutely ZERO bios changes have been made at this point. Had some fan issues, bought some headphones. That’s the gist of it. Trying to work. Just work. And this machine seems exceptionally unstable at this point.

Does anyone know if this issue is being investigated for inclusion in 3.0.5? Any advance bios settings that can be done to prevent this from happening. Spending a lot of money on a pretty paper-weight isn’t what I had in mind.

2 Likes

Sniffing around, I see I’m not the only one with this problem.

Yeah, I am having problems like this too. For me it restarts once a day or sometimes 3-4 times a day. It becomes unusable a times. I also have an AMD GPU in linux, almalinux.

Probably a good place to start is here:

The comments about AER in that thread a good place to start.

I appreciate the tip, James, but that thread is for a version of BIOS that does not exist for the laptop AND is a different AMD architecture. So can you help me understand what you believe is the “relevant” portion of that AER issue but cause the thread also covers at least 14 months of feedback.

Thanks.

Try adding this to your kernel boot parameters:
pcie_ports=native pcie_ecrc=on

And see if you get any AER errors in the logs.
If you do get some, it will tell you which pcie device might be causing the sync-flood.

The AER is only one possible cause of the sync-flood, but worth a try.

Thanks, James. I will add to those to GRUB and see where we get. Sincerely appreciate you pulling that up out of that lengthy thread.