Same issue on Linux. Plus, when on battery the system is unstable, too.
@nrp
I have a version of EC firmware running on my FW16 that can dump the last 4096 Port80 codes. This is enough to survive a reboot. So the next time I see the Freeze then Reboot (FTR) I will capture the Port80 codes for you.
I have posted a list of de-dupe 32bit port80 codes from normal reboots and cold-boots for reference.
So, if FW can add the meaning to any of them, it would be helpful.
I am having the same issue of random crashes and reboots, especially after the 3.05 bios…
Thanks for helping capture this. We’ve also requested more detail from AMD on the Port 80 codes, since we don’t have a decoder for many of them either unfortunately.
One other area of information that will be helpful on this is context from the Windows Minidump if there is one available. For folks hitting this, could you check if there are files in C:/Windows/Minidump/ that correspond to the timing of a freeze/crash and reboot?
If so, you can extract additional information using the WinDbg tool. Dell has a guide on how to set it up here: How to Use Windows Debugger to Troubleshoot Bluescreens | Dell US
You can copy the .dmp file to another location like your home folder, then open up the WinDbg tool, click File, “Open Dump File” and select the .dmp file and open it. You can then click on the “!analyze -v” link that shows up in the log. This will show the stacktrace of the process that was running when the system crashed.
We may also request the .dmp file itself in the future to provide to AMD if the process ends up being the AMD driver.
For what it is worth, I have been having the same problem. I left the laptop FRWK16 running, doing nothing in particular and when I came back to it the whole thing had locked up. On trying to move the mouse and left and right clicking, absolutely nothing, it would not move. Then the laptop powered off completely in a split second. Not a normal shut down in any sense. The laptop then rebooted without being touched, to the login screen.
No error logs of any sort could be found. This happens once a month and I cannot identify any particular circumstance, it seems to be very random. Bios is up to date, all drivers are the very latest ones issued by AMD. I am getting a failed to load HSP firmware message in the windows logs which may indicate a bios issue.
please do not use the latest drivers from AMD. I had a lot of issues with is and a lot of people complain on it. The best stability in my case I got with AMD drivers delivered in driver bundle from framework. there is still issue with sleep, but I’ve disabled it and use only hibernate. When I will have more time I can enable back sleep mode and log output from the EC Debug console
Same here since 24H2 my AMD FW16 is just a brick i have to disable amd audio corpressor and still can’t sleep, hibernate, reboot or power on reliablely …
It’s a total mess …
And there is nothing we can do … because it’s related closely to FW16 + AMD DRIVER + Windows 24H2
oh damn - i forgot I also had to disable the amd audio compressor! I had to do that also. Urgh
Hi,
I have been updating progress on the issue:
My ideas so far:
- I have fixed various bugs in the EC code so that it does a better job of catching port80 codes and not missing them. I have also added a port80 code badbadxx when the EC detects a hardware overrun.
- I have added some port80 codes to the linux kernel so that I can tell if some bug in the linux kernel requested the reboot or if it rebooted without the kernel asking it.
- during my work doing 1 and 2, I have spotted a performance problem where the EC can actually slow down the CPU. I will investigate that also as I don’t think it is a feature we need.
So, still waiting on my laptop to FTR, but at least more ready to diagnose why.
Thanks for continuing to dig further into this, and thanks for the Github issue filings.
We have been doing some work over here trying to track down the “Freeze then Reboot” problem.
We have discovered something interesting:
Note: when is says “Answer: …” it means someone from AMD provided the answer.
Value seen during a Freeze-then-reboot (FTR):
S5_RESET_STATUS = 0x08000800
Answer: Sync flood.
If it helps, the above S5_RESET_STATUS = 0x08000800
was seen on a:
Framework Laptop 13 powered by AMD Ryzen™ 7040 Series Processors:
Ryzen 7 7840U and Ryzen 5 7640U
And also a FW16 AMD 7840HS.
Explanation from AMD:
Bit 27 means this is a reboot due to “sync flood”. Here’s a description:
A drastic method of reporting errors in which a device transmits Sync
packets continuously until reset occurs. Each device on the chain that
detects the Sync flood repeats the pattern on all links. A reset is
required to recover from a Sync flood. Sync flood is analogous to SERR#
(System Error) on a PCI bus.
Debugging this will need to be done with a custom BIOS that disables
reboot on Syncflood so that the full state of the system could be
analyzed to characterize it.
In short; you unfortunately won’t be able to debug this any further
yourself; it needs to be done by Framework BIOS team.
The best thing you as an end user could do is find the pattern of
activity that’s tripping this (if there is one) so that Framework is
able to reproduce it.
I’ll continue to carry the patch mentioned in the github issue to see if perhaps the reset status code differs when the laptop FTRs the next time.
@James3 BTW, thanks a bunch for your help.
@sydney
We only have a sample size of one for the S5_RESET_STATUS = 0x08000800.
So, having more people reporting when they see it would be great.
As the AMD person suggests, anything we, the users, can do to help make it more reproducible so that the FW team can see it happening would help.
So, keep note of exactly which cards were in which slot, and what you were doing at the time of the FTR.
any update from the framework team on this? I haven’t been able to use my computer since mid-February because of this issue.
This weird "what I call “low power trip” happened to me, as well. In one form or another.
I don’t have a GPU, so everything is off of the 7840.
Sometimes, it will do 54W under gaming scenarios, hottest maybe around 90. Sometimes it will do 45 watt, hottest around 80. Very rarely it will go to like , 15 watt, absolutely destroying the framerate.
Ironically this issue is also present on a completely different computer, my Thinkpad T14S. With a Ryzen 4650U. It will draw 25watts ish, and then it will go to 8 watts.
On the T14S, I suspect some power delivery overheating. On the Framework, might be RAM?
DIMM0 temp also gets alarmingly high (95C) during use.
On my F16 (no discrete GPU), I am close to 100% certain that the reboots happen at a random moment AT THE CONDITION THAT I first plug it or unplug it from the wall. Then as long as it remains plugged in or on battery, it will be rock stable regardless of utilization, going to sleep and waking up, etc.
I’m having a bit of trouble understanding what your saying.
Do you mean that upon/after boot whether your device is plugged/unpluggd it’s stable
but later during the session ( without rebooting ) you plug/unplug your adapter, that’s when it becomes unstable?
Exactly. Again, that’s just my observation though.
If plugged in, laptop will stay stable forever.
If on battery, laptop will stay stable forever (as long as the battery allows).
As soon as I unplug it or plug it, it’s a question of minutes or hours before a sudden reboot happens, for no apparent reason.
ok, this is my experience too. At first I thought it was something todo with powersave ( power profile setting in KDE Plasma ) but I later tried balanced while unplugged and still the issue happens
Sometimes the screen just goes black and turns off and I have to turn it back on manually or screen freeze, then it reboots after about 10-20 seconds