FW13 AMD AI 300 (HX 370): 48 Data Fabric Sync Flood crashes in 2 months — comprehensive data

Valentin_Lab · February 3, 2026, 6:10am

Hi everyone,

I’m sharing a detailed report of persistent Data Fabric Sync Flood crashes (0x08000800) on my Framework 13 AMD Ryzen AI 300 in the hope that the data helps Framework and AMD engineers root-cause this issue. I’ve been systematically logging every crash since December 2025.

@Jesse_Darnley mentioned finding a reproducible trigger in June 2025 (power-adapter related, since fixed), but I haven’t seen further updates. This post adds a large, methodical dataset from a different angle: my crashes happen during normal use, not just at sleep/wake, and reproduce on a stock Ubuntu live USB — ruling out custom kernels and installed software.

System Information

Component	Value
Laptop	Framework Laptop 13 (AMD Ryzen AI 300 Series)
CPU	AMD Ryzen AI 9 HX 370 w/ Radeon 890M
RAM	2×48 GB Crucial DDR5 (96 GB total); originally 1×48 GB (Framework stock)
Storage	1 TB WD_BLACK SN770 NVMe, firmware 731100WD
Wi-Fi	Intel AX210
BIOS	03.05 (2025-10-30)
Kernel	6.18.0-fw13 (custom built from mainline); previously 6.14-1016 (Ubuntu)
OS	Ubuntu 24.04.3 LTS
Kernel args	`amdgpu.dcdebugmask=0x12` (disables PSR + Stutter mode); just changed to `0x412` (adds Panel Replay disable)
Power profile	Balanced

The Problem

The dmesg message after every crash:

x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event

The crash is near-instantaneous — no kernel panic, no oops, no pstore data, no kdump capture. The hardware simply resets. Occasionally I notice a brief freeze (~5 seconds) before the reset, sometimes with a CPU core spiking to 100% in the system monitor. The only post-mortem evidence is the reset reason register read at next boot.

Crash Statistics: 48 Sync Floods

I log every crash with DIMM temperatures (from collectd/spd5118), awake uptime between crashes, and activity at crash time. DIMM temperature monitoring was added starting crash #7. Here is the full table:

#	Date	Uptime	RAM	Kernel	DIMM temps (°C)
1	2025-12-02 11:58	?	1×48	6.14	—
2	2025-12-02 12:35	< 1 h	1×48	6.14	—
3	2025-12-03 20:15	~28 h	1×48	6.14	—
4	2025-12-11 18:38	?	2×48	6.14	—
5	2025-12-11 19:28	< 1 h	2×48	6.14	—
6	2025-12-11 20:13	< 1 h	2×48	6.14	—
7	2025-12-15 15:46	~41 h	2×48	6.18	56–61
8	2025-12-23 16:19	~7 h	2×48	6.18	47–50
9	2025-12-24 10:24	~1 h	2×48	6.18	59–67
10	2025-12-25 07:04	~21 h	2×48	6.18	61–66
11	2025-12-25 14:48	~8 h	2×48	6.18	50–53
12	2025-12-26 04:56	~2 h	2×48	6.18	65–72
13	2025-12-26 06:36	~1 h 24	2×48	6.18	42–46
14	2025-12-28 05:50	~23 h	2×48	6.18	47–51
15	2025-12-31 04:30	~35 h	2×48	6.18	52–54
16	2025-12-31 12:09	~4 h	2×48	6.18	68–73
17	2026-01-01 07:16	~10 h	2×48	6.18	57–71
18	2026-01-01 10:06	~3 h	2×48	6.18	60–66
19	2026-01-06 09:00	~64 h	2×48	6.18	57–60
20	2026-01-06 10:39	~1 h 37	2×48	6.18	61–65
21	2026-01-06 11:32	~51 min	2×48	6.18	52–54
22	2026-01-07 08:39	~12 h	2×48	6.18	56–66
23	2026-01-10 10:24	~41 h	2×48	6.18	57–64
24	2026-01-12 02:54	~23 h	2×48	6.18	49–51
25	2026-01-12 15:31	~12 h	2×48	6.18	54–58
26	2026-01-14 05:53	~20 h	2×48	6.18	55–57.5
27	2026-01-15 10:58	~21 h	2×48	6.18	57–62
28	2026-01-15 13:09	~2 h	2×48	6.18	50–53
29	2026-01-17 01:14	~18 h	2×48	6.18	48.5–64
30	2026-01-19 05:49	~26 h	2×48	6.18	51.5–53.5
31	2026-01-20 11:36	~20 h	2×48	6.18	75–81
32	2026-01-24 08:29	~54 h	2×48	6.18	61–71
33	2026-01-26 04:09	~14 h	2×48	6.18	56–63
34	2026-01-27 07:47	~18 h	2×48	6.18	62–71.5
35	2026-01-27 10:04	~2 h 17	2×48	6.18	63–68.5
36	2026-01-28 03:45	~11 h	2×48	6.18	54–61.5
37	2026-01-28 04:02	~15 min	2×48	6.18	62–69
38	2026-01-30 13:27	~37 h 30	2×48	6.18	57.5–61.5
39	2026-01-31 08:15	~9 h 58	2×48	6.18	60–71
40	2026-01-31 08:37	~22 min	2×48	6.18	60–67
41	2026-01-31 08:45	~7 min	2×48	6.18	63.5–72.5
42	2026-01-31 12:44	~3 h 55	2×48	6.18	48–51
43	2026-02-01 10:54	~8 h 53	2×48	6.18	50–52.5
44	2026-02-01 16:19	~8 min	2×48	6.18	60–67
45	2026-02-02 17:26	~18 h	2×48	6.18	62.5–67.5
46	2026-02-03 01:41	~1 h 36	2×48	6.18	62–68
47	2026-02-03 01:55	~13 min	2×48	6.18	64.5–72
48	2026-02-03 ~04:50	~2 h 51	2×48	6.11*	67–69

* Crash #48 occurred on a stock Ubuntu 24.04.3 live USB (kernel 6.11.0-17-generic, no custom kernel args, no amdgpu.dcdebugmask, no encrypted root, no collectd/Docker). Same 0x08000800 reset code.

Uptime between crashes ranges from 7 minutes to 64 hours. Average is roughly 12–15 hours of awake time. Last week (Jan 27 – Feb 3): 15 crashes, average ~7 h 40 min, min 7 min, max 37 h 30 min — the frequency is increasing. Note: uptime is cumulative awake time only — suspend periods are excluded. Longer uptimes span multiple wake/suspend cycles (e.g., the 64 h entry spans 11 sessions over 5 days).

What I’ve Ruled Out

Variable	Tested	Result
Temperature	Crashes at DIMM temps 42–46 °C (cold) and 75–81 °C (hot). Ran 1 h 45 min video call at 75–77 °C without crash. Ran 1 h+ session at DIMM2 83–84 °C SPD (potentially 96–101 °C hotspot) without crash. Next crash was at 57 °C.	Not the cause
Cooling	Used laptop cooling stand with fans for weeks — dramatic temp reduction, zero impact on crash frequency	Not the cause
Kernel	6.14-1016 (Ubuntu stock), 6.18.0-fw13 (custom mainline), 6.11.0-17 (stock Ubuntu live USB). 6 crashes on 6.14, 41 on 6.18, 1 on stock 6.11	Not the cause
Custom software	Live USB test: stock Ubuntu 24.04.3, no custom kernel args, no `amdgpu.dcdebugmask`, no encrypted root, no collectd/Docker — crashed after ~2 h 19 min	Not the cause
RAM config	1×48 GB from Framework → 2×48 GB Crucial DDR5	Not the cause
iGPU VRAM	BIOS: 0.5 GB → 16 GB	No effect
Power supply	Framework charger + third-party 100 W PSU	No effect
CPU load	Crashes during idle, during terminal work, during compilation, during Firefox	No correlation
amdgpu PSR	`amdgpu.dcdebugmask=0x12` — this fixed an earlier, much worse crash pattern (crashes within minutes of boot). Sync floods still occur with it.	Mitigates a different issue

What I Haven’t Tried Yet

amdgpu.dcdebugmask=0x412 — just applied, adds Panel Replay disable (DC_DISABLE_REPLAY) to my existing PSR + Stutter disable. No data yet on whether it changes crash frequency.

Key Observations

Reproduces on stock Ubuntu live USB. Crash #48 occurred on an unmodified Ubuntu 24.04.3 live USB (kernel 6.11.0-17-generic) — no custom kernel args, no amdgpu.dcdebugmask, no encrypted root, no installed software. This rules out my kernel build, configuration, and software stack as contributing factors. The issue is firmware or hardware.
amdgpu.dcdebugmask=0x12 mitigates a related but separate issue. Without it, my first install on the HX 370 board had crashes within minutes of boot — sometimes before the kernel fully loaded. With it, I get daily-ish crashes instead. However, the live USB crashed after ~2 h 19 min without this flag, suggesting the display controller / PSR triggers a more aggressive crash pattern, while the sync floods are a distinct underlying problem.
Crashes happen during active use AND idle. Several crashes occurred while I was away from the computer (lid open, system idle, no screensaver). One notable crash (#14) happened a few minutes after I left to eat — could be a power state transition.
Clustering pattern: Jan 31 had 4 crashes (08:15, 08:37, 08:45, 12:44). Once the system starts crashing, it tends to crash again soon. The first three were only 22 min and 7 min apart.
The RDSEED bug exists on my CPU: RDSEED32 is broken. Disabling the corresponding CPUID bit. — This is a known AMD hardware bug on the HX 370. While the kernel works around it for random number generation, it signals silicon-level issues on this platform.

What Would Help

Framework engineering: Is there any firmware/EC diagnostic I can run? I’m happy to install fw-ectool, run custom kernels, or enable any debug tracing you need. I have collectd logging temperatures, detailed Framework diagnostic logs for each crash, and can provide anything else.
Other FW13 AI 300 (HX 370) users: Are you seeing 0x08000800 in your dmesg? Run journalctl -b 0 | grep "reset reason" after an unexpected reboot. Please report your findings here. Also, if your FW13 HX 370 is running stable on Linux, I’d love to hear about it — I’m trying to determine whether this is a widespread platform issue or specific to my unit, and positive data points matter as I’m considering a replacement.
Framework team: It would help the community to know roughly how many RMAs have been filed for sync flood / 0x08000800 crashes on any FW (FW13/FW16) using AMD procs. Understanding whether this affects a small batch or a significant portion of units would help owners decide whether to wait for a fix or request a replacement.

RMA Status

I have an open support ticket with Framework. They’ve asked me to provide diagnostic logs (using their log-helper script), which I’ve done for every crash. Awaiting next steps.

Related Threads & References

Framework Community:

AMDGPU random crashes on Ryzen AI HX 370 — VCN ring timeouts (different bug, fixed in 6.14.9/6.15+)
HX 370 instability — Hard freezes and flickering on Gentoo
FW13 Ubuntu HX370 freezes — Freezes every 3–5 min on Ubuntu, moved to Fedora
Framework 13 AMD Hard Crashing Issue — 7040 series, resolved by RMA for that user
Spontaneous restarts upon closing lid — Jesse_Darnley investigating, confirmed sync flood on AI 300
FRWK16 Random Crash then Reboots — FW16 sync flood (same 0x08000800), James3’s EC/power theories

GitHub:

FW16 Freeze then Reboot — S5_RESET_STATUS = 0x08000800 (Issue #41) — Official Framework sync flood tracker

Non-Framework reports (same error):

CachyOS Forum: Ryzen 5 5600 Data Fabric Sync Flood — Desktop Ryzen 5 5600 + ASUS B450M, stable on Windows, crashes on Linux
Arch Forums: Ryzen 9 3900X freezing into reboot — Desktop 3900X + X570, MCE errors + sync flood
Xen-devel: PCI passthrough of XHCI on Framework AMD crashes the host — Ryzen 5 7640U, sync flood reproducible via specific PCI BAR reads under Xen

Kernel / AMD:

Linux 6.16: Report cause of AMD system reset — The patch that added the x86/amd: Previous system reset reason message
AMD Zen Debugging Documentation — Official kernel docs on S5_RESET_STATUS register
Linux fix for random reboots during virtualization (Ryzen 7000/8000) — Related Zen 4 stability fix
Kernel patch: x86/CPU/AMD: Print the reason for the last reset — Yazen Ghannam (AMD) patch details

James3 · February 3, 2026, 7:14am

What devices do you have plugged into the card slots? I have proved that devices can cause this.

Valentin_Lab · February 3, 2026, 11:58am

Devices tested had - at the same time - enough variations, and enough stability, to undermine the hypothesis that they may participate in the issue. I have 2 setups with different screen/docks, and I also tested with just nothing connected. I had at least one crash each times. Many crashes occurred with the same setup and no changes in the setup in all the runs in various frequencies (minutes after boot to several days). The crash never happened while connecting/disconnecting a device. Some of my devices have clear issues and are faulty (I have a dock with visible issues after some times) which triggers many connection problems in Ubuntu, or on hardware detection, but I never could connect it to a crash in a clear way. I connected it to 2 different 4k screens in long period of time through to different dock and tried also directly, with no changes in the crash frequency. The crash happened also without any dock nor any device connected on it.

By any chances, do you own a Framework on AMD ? Do you have random freezes ?

James3 · February 3, 2026, 5:18pm

Hi,

I started this, so yes:

github.com/FrameworkComputer/SoftwareFirmwareIssueTracker

FW16 Freeze then Reboot (FTR) S5_RESET_STATUS = 0x08000800 <- Sync Flood.

opened 01:24PM - 02 Feb 25 UTC

jcdutton

bug Laptop 16 AMD Ryzen 7040

## Device Information ### System Model or SKU [ ] Framework Laptop 16 (AMD Ryz…en™ 7040 Series) No dGPU. ### BIOS VERSION 3.0.5 Windows: N/A Linux: Open a terminal and run the following command ```sudo dmidecode --string bios-version``` 03.05 ### DIY Edition information Memory: Manufacture and SKU Kingston Fury Impact: Part Number: KF556S40-32 2x making 64GB total. Storage: Manufacture and SKU Model Number: WD_BLACK SN850X 1000GB Firmware Version: 620361WD Wifi: Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter ### Port/Peripheral information 1. USB-C card, nothing plugged in. 2. Empty 3. Empty 4. Empty 5. USB-C card, FW16 PSU plugged in. 6. USB-A card, nothing plugged in. ### Standalone Operation Are you running your mainboard as a standalone device. Is standalone mode enabled in the BIOS? - [ ] No ## Describe the bug S5_RESET_STATUS = 0x08000800 <- Sync Flood. Occasionally, about once a month I get a random crash/freeze then about 20 seconds later a reboot. There is some details in this community thread: https://community.frame.work/t/frwk16-random-crash-then-reboots/62411/31 This issue is only for "Freeze then Reboot" issues. Not "Freeze then power off". ## Steps To Reproduce Steps to reproduce the behavior: 1. Start from a powered off laptop. 2. Power on laptop 4. Wait a random amount of time. Play videos, netflix, youtube etc. 5. System freezes for about 20 seconds and then reboots itself. Note: I generally have the power plugged in most of the time. For all the FTR I have seen, the power was plugged in at the time. The PSU used is the FW provided one that comes with the FW16. Note: The FW16 was not under high load. the cpu fans were not audibly running. I.e. I could not hear them above the netflix / youtube video playing. ## Expected behavior It should not randomly freeze then reboot. (FTR) ## Screenshots N/A ## Operating System (please complete the following information): - OS/Distribution: Linux/Ubuntu - Version: 24.04 - Linux Kernel Version: `uname -a` 6.12.7 <- Mainline compiled kernel. ## Additional context Add any other context about the problem here.

For background, there have been multiple false negatives investigating this, so nothing should be ruled out unless physically proved at your own hands, or at least reproduced by multiple people.

James3 · February 4, 2026, 7:53am

It might also be mitigated with kernel parameter:
processor.max_cstate=1

Can you see if it helps your situation, as you seem to be able to reproduce it more than me.

Taken from:

gist.github.com

https://gist.github.com/dlqqq/876d74d030f80dc899fc58a244b72df0

ryzen_bug.md

# Random "Freezing" with AMD Ryzen CPUs

It seems that numerous GNU/Linux users (including myself) have been having issues with the system randomly "freezing" during light usage. From `journalctl` output and anecdotal accounts, it is speculated that the AMD Ryzen CPUs do not support other C-states for power management very well (at least on GNU/Linux distributions), and the freezing may be resolved by limiting the C-state of the CPU.

# Possible Solution

Limiting the C-state of the CPU can be done through the addition of the following kernel boot parameter.

```
processor.max_cstate=1

This file has been truncated. show original

Valentin_Lab · March 2, 2026, 2:43am

Update: 85 sync floods in 3 months (was 48 in 2 months)

37 more crashes since my original post. Now at 85 Data Fabric Sync Flood crashes over 3 months (Dec 2025 – Mar 2026). Here’s what I’ve tested and learned.

New mitigations tested — neither worked

amdgpu.dcdebugmask=0x412 (applied Feb 4): upgraded from 0x12, adds Panel Replay disable. 37 sync floods since. No improvement.

processor.max_cstate=1 (restricts CPU to C1 halt, no deeper C-states): tested twice.

First run (Feb 5–8): zero sync floods, but it caused severe suspend hangs — the system couldn’t wake from s2idle. Three consecutive suspend hangs, one iwlwifi soft lockup, one networking failure. The suspend hangs were a worse problem that may have hidden the sync floods during this period.
Second run (Feb 27+): re-added with suspend disabled to avoid the hang issue. 4 sync floods in ~3 days at a similar rate to without it. No effect.

Crash clustering continues

Feb 19: 5 crashes in one day (08:16, 10:03, 10:26, 10:43, 11:28 — four within 85 min)
Feb 23: 3 crashes in 73 minutes (01:24, 01:28, 02:37)
Uptime between crashes still ranges from 1 minute to 35 hours

New crashes (#49–85)

#	Date	Uptime	DIMM temps (°C)	Notes
49	2026-02-04 14:48	~24 h 26	69.5–74.5	first with dcdebugmask=0x412
50	2026-02-04 15:17	~28 min	65–70.5
51	2026-02-04 16:09	~50 min	59–72.5
52	2026-02-05 08:46	~7 h 47	72.5–80.5
—	Feb 5–8: max_cstate=1 active — 0 sync floods, 3 suspend hangs
53	2026-02-08 06:25	~1 min	51–67	first boot after removing max_cstate=1
54	2026-02-12 10:13	~1 h 48	62–67
55	2026-02-14 02:06	~21 h 15	58.5–64
56	2026-02-14 04:55	~2 h 45	46.5–50
57	2026-02-15 08:45	~6 h 15	65–69.5
58	2026-02-15 11:42	~2 h 56	72–80
59	2026-02-16 01:05	~7 h 54	78–85
60	2026-02-17 06:36	~13 h 28	74.5–80
61	2026-02-17 14:34	~7 h 54	65–73
62	2026-02-18 10:38	~12 h 16	63–70
63	2026-02-18 12:24	~1 h 46	69–74
64	2026-02-19 08:16	~11 h 55	57–65
65	2026-02-19 10:03	~1 h 46	58–66
66	2026-02-19 10:26	~21 min	60–66.5
67	2026-02-19 10:43	~17 min	60.5–66
68	2026-02-19 11:28	~45 min	54.5–64
69	2026-02-20 01:00	~9 h 01	71–80
70	2026-02-20 01:31	~30 min	63–71.5
71	2026-02-21 15:56	~27 h 38	56–68
72	2026-02-22 07:18	~7 h 12	72.5–79.5
73	2026-02-22 07:43	~25 min	61–75.5
74	2026-02-22 10:49	~3 h 06	63.5–70
75	2026-02-23 01:24	~7 h 26	64–70.5
76	2026-02-23 01:28	~4 min	62–73.5
77	2026-02-23 02:37	~1 h 08	61–65.5
78	2026-02-24 02:37	~13 h 45	64–72
79	2026-02-24 03:06	~27 min	63.5–70
80	2026-02-24 08:53	~4 h	62–65.5
81	2026-02-27 04:08	~35 h 25	62–67
82	2026-02-27 14:00	~9 h 48	65–75	max_cstate=1 re-added, suspend disabled
83	2026-02-28 06:05	~16 h 03	51–55.5	max_cstate=1, suspend disabled
84	2026-03-01 11:19	~28 h 36	61–67.5	max_cstate=1, suspend disabled
85	2026-03-01 19:00	~7 h 38	51–54	max_cstate=1, suspend disabled

All crashes on 2×48 GB, kernel 6.18.0-fw13, amdgpu.dcdebugmask=0x412. DIMM temps from collectd/spd5118.

Where things stand

Everything I can change on the software side has been tried. The crash reproduces across:

3 kernels (6.14, 6.18, stock 6.11 live USB)
With and without amdgpu.dcdebugmask (0x12, 0x412, none)
With and without processor.max_cstate=1
With and without suspend
At DIMM temps from 42 °C to 85 °C
During idle and under load

I’m running out of things to try on my end. @Jesse_Darnley, @Matt_Hartley — any update on sync flood investigation? Happy to run any firmware/EC diagnostics or test patches.

Shiroudan · March 2, 2026, 3:10am

I am curious if this also occurs on Windows or if this is a error in the Linux stack somewhere.
That would point out the difference between a hardware/firmware issue or a software issue.

I’m sorry you’ve had such a hard time with your mainboard, my 7840U hasn’t had a single crash yet.

Valentin_Lab · March 12, 2026, 4:21am

Major update: WD SN770 firmware update — zero crashes in 55+ hours

Updating from 102 sync floods in 3.5 months to what may be the fix — or at least a very significant mitigation.

TL;DR

I updated the WD_BLACK SN770 1TB NVMe firmware from 731100WD to 731120WD. Since then: 55 hours and 25 minutes of cumulative awake time across 7 sessions, zero crashes. For context, I was averaging a crash every ~12–15 hours, with some days having 5 crashes.

How I got here

After 102 Data Fabric Sync Flood crashes and exhausting every software-side mitigation (3 kernels, dcdebugmask, processor.max_cstate=1, stock Ubuntu live USB — all crashed), I was running out of options. I have an open RMA with Framework support, but honestly the experience has been frustrating — slow responses, and the suggestions (run the log-helper script, try with one RAM stick at a time in each slot) felt like generic troubleshooting that didn’t account for the extensive testing I’d already done and shared with them. I’d already tried different RAM configs, different kernels, a stock live USB — the data was all there. So I kept investigating on my own, and turned to the PCIe link to the NVMe SSD.

Step 1 — Making PCIe errors visible: The Framework BIOS (via AMD AGESA) refuses to grant AER (Advanced Error Reporting) control to the OS. This means the kernel is completely blind to PCIe errors — they happen silently with no logging, no interrupts, no recovery. I added pcie_ports=native to bypass this and force the kernel’s AER driver to activate.

Step 2 — What I found: The NVMe link was generating correctable PCIe errors continuously — about 30 per awake-hour. RxErr (receiver errors) and BadTLP (corrupted packets) on the SSD, Timeout (completion timeouts) on the root port. Errors came in correlated pairs: a corrupted packet arrives → the receiver rejects it → the sender never gets an acknowledgment → timeout. This is the signature of a marginal PCIe link.

Step 3 — A community post that connected the dots: A FW16 user in the thread Framework 16 Re-occurring BSOD on this very forum reported that updating their WD SN770 firmware via WD Dashboard fixed their recurring crashes. That post is what directly led me to try this — credit where it’s due.

It made sense of the PCIe errors: the WD SN770 is a DRAM-less NVMe that uses Host Memory Buffer (HMB) — it borrows 200 MB of your system RAM via PCIe for its internal operations. WD issued a critical firmware advisory for HMB bugs causing BSODs on Windows 11 24H2, and the Proxmox/OpenZFS community confirmed HMB problems affect non-Windows OSes too. The mechanism fits perfectly: buggy HMB firmware → erratic PCIe transactions → correctable errors escalate → Data Fabric can’t recover (because AER is disabled) → Sync Flood.

A note on WD’s advisory scope: WD’s advisory only lists the 2 TB models (SN770 2TB, SN770M 2TB) as affected. My drive is a 1 TB — not mentioned in the advisory at all. Yet it uses the same 200 MB HMB (confirmed via nvme id-ctrl), and appears to have been suffering from the exact issue the advisory describes. If the crash-free streak holds, WD’s advisory is incomplete — the 1 TB SN770 should be listed as an affected model, and the issue is not limited to Windows 11 24H2 BSODs. Linux users experiencing Data Fabric Sync Floods would have no reason to think their 1 TB drive needs this update based on WD’s current documentation.

Step 4 — The firmware update: On March 8, immediately after crash #102, I updated to 731120WD. No crashes since. This is by far the most significant change in my entire 3.5-month investigation.

How to update (Linux, no Windows needed)

# Download firmware
curl -k -o /tmp/731120WD.fluf \
    "https://wddashboarddownloads.wdc.com/wdDashboard/firmware/WD_BLACK_SN770_1TB/731120WD/731120WD.fluf"

# Flash to firmware slot 2 (slot 1 keeps old firmware as fallback)
sudo nvme fw-download /dev/nvme0 -f /tmp/731120WD.fluf
sudo nvme fw-commit -s 2 -a 3 /dev/nvme0

# Reboot to activate
sudo reboot

References: sorend’s gist, Framework community WD update guide. The -k flag on curl is needed because WD’s CDN SSL certificate was expired at time of download.

Who should try this

If you have a WD NVMe drive (SN770, SN770M, SN850X, or similar DRAM-less models) and are experiencing sync floods — check your firmware version and update if possible. These drives all use HMB, and HMB bugs can generate the kind of PCIe errors that the Data Fabric would choke on.

Check your current firmware:

sudo nvme id-ctrl /dev/nvme0 | grep -i "fr "

Caveats

55 hours is promising but not definitive proof. My longest previous streak was ~64 hours (before crash #19). I’ll continue monitoring and update this thread.
The PCIe correctable errors (RxErr, BadTLP) may still be present after the firmware update — what matters is whether the firmware update eliminates the conditions that cause them to escalate to uncorrectable errors that trigger a Sync Flood.
This may not explain all sync floods across all hardware configurations. But for anyone with a WD DRAM-less NVMe, the firmware is the lowest-hanging fruit to try.

Framework and AMD: this needs your attention

@Jesse_Darnley @Matt_Hartley — I’m flagging this explicitly because after 3.5 months and 102 crashes, this is the first concrete, actionable lead pointing to a specific component with a specific fix. Not a kernel parameter workaround, not a “try this and hope” — a firmware update with a plausible mechanism backed by PCIe error data and a WD critical advisory. I’d really appreciate acknowledgment and feedback from the Framework engineering team.

Specifically:

Should this be relayed to the AMD BIOS/firmware team ? AMD told us on GitHub that debugging sync floods “needs to be done by Framework BIOS team.” The PCIe AER data and NVMe firmware correlation give them something concrete to investigate — this is no longer a “random unreproducible crash.”
Should enabling AER in the BIOS be considered ? The current AMD AGESA configuration refuses to grant PCIe Advanced Error Reporting control to the OS. This means every Framework AMD laptop is completely blind to PCIe errors — they happen silently with no logging, no interrupts, no recovery. I only found the correctable error stream on my NVMe link by forcing pcie_ports=native to bypass the BIOS. Without that, I’d still be in the dark after 102 crashes. Enabling AER would immediately give every affected user — and your own support team — visibility into what’s going wrong.
Should you add NVMe firmware version to the sync flood diagnostic workflow ? When a user reports 0x08000800, the first question should be: what NVMe drive and firmware version? WD DRAM-less drives (SN770, SN770M, SN850X) use Host Memory Buffer and should be flagged for firmware updates.
Is there internal data on NVMe models across sync flood RMAs? If WD DRAM-less drives are overrepresented, that would confirm this finding and could prevent unnecessary mainboard replacements.

This issue has been affecting users across FW13, FW16, multiple AMD CPUs, and multiple configurations for over a year. I know I’m not alone in feeling that the community response to sync floods has been lacking — on GitHub, some users have switched to Intel boards, others have expressed real disappointment in how Framework has handled this. A clear, engaged response here would matter to a lot of people who are watching these threads and wondering whether to keep trusting the platform.

James3 · March 12, 2026, 6:12am

@Valentin_Lab
That is a really good find.
After i found out (proved) that PCIe devices can cause a sync flood. What you have found here makes a lot of sense. I was not aware that the FW BIOS was suppressing PCIe errors.
A sync flood causes a forced reboot. I don’t see how windows can do a BSOD for a sync flood.
Note: it is quite normal for BIOS to suppress PCIe errors because not many BIOS support pcie error recovery methods. But the linux kernal does support pcie error recovery.
This might also help some oculink users track down problems.
@Mario_Limonciello In case this is useful for you.

Quentin_Hartman · March 16, 2026, 7:54pm

I’ve been fighting similar crashes on a FW13 running the AMD 350. It’s made the machine unusable for work since I never know when the rug will get pulled. I’ve had nearly every piece of hardware replaced thanks to FWs great RMA help, but no real resolution. I found reliable repro with transcoding video in shotcut. I also found that the problem is less frequent under Fedora 43 than it was under Ubuntu. The only piece that hasn’t been replaced yet is my WDBlack SN850X. Thanks for the guidance on the firmware, I’ll update mine and see if it makes any difference. This feels like a really interesting common thread. I actually found this doing some research while I install windows on an older SSD I had lying around to see if I could repro the problem there, but I now have a new avenue to explore.

Edit - There was a new firmware available for my drive. I’ve updated and we’ll see how it goes.

James3 · March 16, 2026, 9:02pm

@Quentin_Hartman

Please try the kernel parameters:
pcie_ports=native pcie_ecrc=on

Then see if you get any AER errors in the logs on Linux.
If you are seeing AER errors, then it should tell you what device is causing the problem.

Quentin_Hartman · March 16, 2026, 9:21pm

I’ll add that into the mix. Thanks!

Quentin_Hartman · March 19, 2026, 6:00pm

OK, so after updating the firmware on the drive, I’ve three workdays now with no rugpulls, so this is a substantial improvement.

@James3 I added those kernel parameters. When I search my logs for AER, I see a bunch of messages like this:

```
Mar 14 21:35:36 emerald kernel: acpi PNP0A08:00: _OSC: platform does not support [SHPCHotplug AER]
```
but they are all timestamped from before I updated the firmware on the drive and added these params. Am I looking for the right thing?

James3 · March 19, 2026, 6:09pm

The AER errors in the logs would have looked something like this, before you upgraded the firmware. As you have upgraded the firmware, you should not see any AER errors in the logs. So, it looks like the firmware update fixed the problem for you.

[ 7542.204821] pcieport 0000:00:04.1: AER: Correctable error message received from 0000:63:01.0
[ 7542.204837] pcieport 0000:63:01.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
[ 7542.204841] pcieport 0000:63:01.0: device [8086:15da] error status/mask=00000080/00002000
[ 7542.204846] pcieport 0000:63:01.0: [ 7] BadDLLP

Quentin_Hartman · March 24, 2026, 12:07am

I’ve had something like 70 powered-on hours since doing this update, including some heavy activities that are known to be reliable reproducing activities for the rugpull, and haven’t gotten any. At this point I consider the update of the firmware on my SSD a solution to this problem. Would love to see the support team include this possibility into their troubleshooting process for problems like mine. If they knew to ask about this and had guidance for upgrading the SSD firmware, this problem would have likely been solved for me months ago. Huge thanks to @Valentin_Lab for this post, my machine would still be essentially unusable if not for the information you shared!

Valentin_Lab · March 25, 2026, 2:41pm

I’ve reached out to Framework directly on X: link. If you’ve been affected by this issue, amplifying would help get their attention.

Topic		Replies	Views
[0x08000800] Uncorrected error causing a data fabric sync flood event Linux ubuntu , fedora	14	621	March 19, 2026
Spontaneous and ungraceful restarts upon closing the lid Linux arch	41	1434	April 8, 2026
FRWK16 - Random Crash then Reboots Community Support framework-laptop-16-amd-7040	95	2755	November 28, 2025
AMD Framework 13 with Debian still crashing a year later Linux debian	19	2172	November 14, 2025
System freeze after resuming from suspend Linux nixos	11	367	March 22, 2026