You get the sync flood message when using newer kernels.
It reads the S5-RESET-STATUS register at boot time and decodes it.
It reports on why the previous reboot happened.
I have to admit, the reason why the message appears its not the interest of my thread, but mainly the culprit of the restart. So far the error code, indicates a serious hardware issue, often related to the CPU or memory. This error can lead to system instability, including random reboots or crashes, and may require hardware diagnostics or replacement to resolve.
all these [1 - 2] discussions threads describe the same exact behavior i am facing, hence looking for a solution of this or a more specific config, since i am using the default grub/bios config.
Ok, so i have done some experiments, and found the following aspects:
By default, the Fedora gets installed using :
Fully dynamic GPU VM
Unbounded UMA growth
2. Using the USB-C as a video output and stressing the GPU and/or RAM memory using the following commands:
midu@framework:~$ sudo stress-ng \
--vm 16 \
--vm-bytes 105G \
--vm-hang 0 \
--timeout 21600s \
--metrics-brief \
--ftrace
stress-ng: info: [24923] setting to a 6 hours run per stressor
stress-ng: info: [24923] dispatching hogs: 16 vm
stress-ng: info: [24923] note: 32 cpus have scaling governors set to powersave and this may impact performance; setting /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' may improve performance
stress-ng: info: [24924] vm: using 6.56G per stressor instance (total 105G of 112.96G available memory)
@midu16
The [2] was raised by me, so I am familiar with the problem.
The real problem is making it reproducible.
At the moment, for me, it happens rarely, i.e. once a week or something like that.
It makes it very difficult for a FW or AMD engineer to fix it.
What we need is something that triggers the problem in a reproducible way.
If FW or AMD engineers can reproduce the problem, it will get fixed.
I tried following your instructions below, but it does not reproduce the problem on my FW16, with no dGPU.
We currently have found two possible causes of [0x08000800] Sync-flood.
AER Errors on PCIe devices.
Made more likely when watching Videos.
Made more likely with stress-ng and glmark2-es2.
There might be a common link between 2 and 3 in that sync-flood might be made more likely when more GPU use of host RAM is happening.
In 2, watching videos obviously using a lot of host RAM for the frame buffers etc.
In 3, the stress-ng and glmark2-es2 obviously use a lot of host RAM.
There has been a recent Linux kernel patch that might improve 2 and 3. It relaxes the latency expectations the GPU has on accessing host RAM.
As you can reproduce the problem on your FW Desktop quite easily, I would probably recommend trying the Linux kernel patch, to see if it helps.
The possibly helpful patch will be something like this one:
That is for the DCN35. I don’t know with DCN the FW Desktop has, so a patch specific to that one is needed.
Edit: The patch above probably does not apply to you. See Mario’s message below.
For future reference this is documented with the kernel.
That being said the latency changes were for underflow. To me they are and most likely related to additional latency found in DDR modules vs LPDDR. Most mobile designs are LPDDR, socketed DDR is less common outside of Framework.
If that’s a correct hypothesis then I wouldn’t expect changing memory self refresh timing to influence an issue on Framework desktop (it’s all LP).