[0x08000800] Uncorrected error causing a data fabric sync flood event

Which Linux distro are you using?

At the moment i am using Fedora 32. But same behavior was encountered using Ubuntu 35.

Which release version?
midu@framework:~/Downloads$ cat /etc/os-release
NAME=“Fedora Linux”
VERSION=“43 (Workstation Edition)”
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=43
VERSION_CODENAME=“”
PRETTY_NAME=“Fedora Linux 43 (Workstation Edition)”
ANSI_COLOR=“0;38;2;60;110;180”
LOGO=fedora-logo-icon
CPE_NAME=“cpe:/o:fedoraproject:fedora:43”
DEFAULT_HOSTNAME=“fedora”
..redacted..
REDHAT_BUGZILLA_PRODUCT=“Fedora”
REDHAT_BUGZILLA_PRODUCT_VERSION=43
REDHAT_SUPPORT_PRODUCT=“Fedora”
REDHAT_SUPPORT_PRODUCT_VERSION=43
SUPPORT_END=2026-12-02
VARIANT=“Workstation Edition”
VARIANT_ID=workstation

(If rolling release, last date updated?)

Which kernel are you using?

6.17.12-300.fc43.x86_64

Which BIOS version are you using?

├─System Firmware:
│ │ Device ID: f8571b257837e7537ea0508dc9793e90f594fdb2
│ │ Summary: UEFI System Resource Table device (Updated via capsule-on-disk)
│ │ Current version: 0.0.3.4
│ │ Minimum Version: 0.0.1.0
│ │ Vendor: Framework (DMI:INSYDE Corp.)
│ │ Update State: Success
│ │ GUID: eb68dbae-3aef-5077-92ae-9016d1f0c856
│ │ Device Flags: • Internal device
│ │ • Updatable
│ │ • System requires external power source
│ │ • Supported on remote server
│ │ • Needs a reboot after installation
│ │ • Cryptographic hash verification is available
│ │ • Device is usable for the duration of the update
│ │ Device Requests: • Message

Which Framework Desktop model are you using? (AMD Ryzen™ AI Max 300 Series)

midu@framework:~/Downloads$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
**Model name: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
**

The issue it seems to be replicable on my device by running the following command:

sudo stress-ng \
  --vm 16 \
  --vm-bytes 105G \
  --vm-hang 0 \
  --timeout 21600s \
  --metrics-brief \
  --ftrace

or

sudo glmark2-es2 --run-forever

I have been testing as well the following:

sudo stress-ng \
  --cpu 80% \
  --cpu-method all \
  --timeout 21600s \
  --metrics-brief \
  --ftrace

what i can say its that on the CPU load this do not generates the same replication. The memory load though its replicating the issue quite easy.

Sorry, and where are you seeing this message?

its part of the journalctl :

midu@framework:~/Downloads$ sudo journalctl -k  --no-pager | grep -i -E “mce|fabric|uncorrected”
Journal file /var/log/journal/6c5bbdcdbe414384a81a731b6d9d46de/user-1000@000646dd1230be39-606782af04095396.journal~ is truncated, ignoring file.
Dec 30 18:00:13 framework.frntdeu1.pop.starlinkisp.net kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event
Dec 30 18:00:35 framework.frntdeu1.pop.starlinkisp.net kernel: MCE: In-kernel MCE decoding enabled.

You get the sync flood message when using newer kernels.
It reads the S5-RESET-STATUS register at boot time and decodes it.
It reports on why the previous reboot happened.

hi @James3 ,

I have to admit, the reason why the message appears its not the interest of my thread, but mainly the culprit of the restart. So far the error code, indicates a serious hardware issue, often related to the CPU or memory. This error can lead to system instability, including random reboots or crashes, and may require hardware diagnostics or replacement to resolve.

[1] Ryzen 5 5600 - "Data Fabric Sync Flood" / MCE crashes (Stable on Windows) - Issues & Assistance - CachyOS Forum

[2] FW16 Freeze then Reboot (FTR) S5_RESET_STATUS = 0x08000800 <- Sync Flood. · Issue #41 · FrameworkComputer/SoftwareFirmwareIssueTracker · GitHub

all these [1 - 2] discussions threads describe the same exact behavior i am facing, hence looking for a solution of this or a more specific config, since i am using the default grub/bios config.

Ok, so i have done some experiments, and found the following aspects:

  1. By default, the Fedora gets installed using :
  • Fully dynamic GPU VM

  • Unbounded UMA growth

2. Using the USB-C as a video output and stressing the GPU and/or RAM memory using the following commands:

midu@framework:~$ sudo stress-ng \
  --vm 16 \
  --vm-bytes 105G \
  --vm-hang 0 \
  --timeout 21600s \
  --metrics-brief \
  --ftrace
stress-ng: info:  [24923] setting to a 6 hours run per stressor
stress-ng: info:  [24923] dispatching hogs: 16 vm
stress-ng: info:  [24923] note: 32 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info:  [24924] vm: using 6.56G per stressor instance (total 105G of 112.96G available memory)

and

midu@framework:/apps/open-notebook$ sudo glmark2-es2 --run-forever
=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    Radeon 8060S Graphics (radeonsi, gfx1151, LLVM 21.1.5, DRM 3.64, 6.17.12-300.fc43.x86_64)
    GL_VERSION:     OpenGL ES 3.2 Mesa 25.2.7
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================

results in the 0x08000800 falt and the node reboots.

Knowing this aspect [1 and 2], the 2 solutions i have identify so far are the following:

  1. Switch from using a USB-C video output to a HDMI or DigitalPort.
  2. Add the following kernel arguments:
sudo grubby --update-kernel=ALL --args="amdgpu.gttsize=8192 amdgpu.vm_update_mode=3"

and in BIOS:

  • Set UMA Frame Buffer = 8 GB

  • Disable Auto / Dynamic

Note, that this behavior can be experience on the Linux much easy than on the Windows OS.

Performing one of the 2 solutions, resulted in passing the both tests 21600s timeout.

1 Like

@midu16
The [2] was raised by me, so I am familiar with the problem.
The real problem is making it reproducible.
At the moment, for me, it happens rarely, i.e. once a week or something like that.
It makes it very difficult for a FW or AMD engineer to fix it.
What we need is something that triggers the problem in a reproducible way.
If FW or AMD engineers can reproduce the problem, it will get fixed.
I tried following your instructions below, but it does not reproduce the problem on my FW16, with no dGPU.

1 Like

@James3 ,

Under the conditions i describe, this can be replicated in less than 15 minutes. To me this made the usage of the Framework Desktop unusable.

hi @James3 ,

wondering, on your FW16, what CPU are you using ?

FW16 7840HS cpu