FRWK16 - Random Crash then Reboots

Same issue on Linux. Plus, when on battery the system is unstable, too.

@nrp
I have a version of EC firmware running on my FW16 that can dump the last 4096 Port80 codes. This is enough to survive a reboot. So the next time I see the Freeze then Reboot (FTR) I will capture the Port80 codes for you.
I have posted a list of de-dupe 32bit port80 codes from normal reboots and cold-boots for reference.
So, if FW can add the meaning to any of them, it would be helpful.

3 Likes

Might be overheating. I remember having a episode or two of just unreasonably low fps in gaming (went to like 20 fps on a otherwise 60+ fps). Went into CoreTemp and it reported 105 degrees hottest (ever recorded).

This is before I fixed the thermal mod and was getting wildly inconsistent temps.

I don’t imagine this current re-paste giving me too much problem. The thing hits 85C under 54W full package, and 50W on CPU only.

I am having the same issue of random crashes and reboots, especially after the 3.05 bios…

1 Like

Thanks for helping capture this. We’ve also requested more detail from AMD on the Port 80 codes, since we don’t have a decoder for many of them either unfortunately.

2 Likes

One other area of information that will be helpful on this is context from the Windows Minidump if there is one available. For folks hitting this, could you check if there are files in C:/Windows/Minidump/ that correspond to the timing of a freeze/crash and reboot?

If so, you can extract additional information using the WinDbg tool. Dell has a guide on how to set it up here: How to Use Windows Debugger to Troubleshoot Bluescreens | Dell US

You can copy the .dmp file to another location like your home folder, then open up the WinDbg tool, click File, “Open Dump File” and select the .dmp file and open it. You can then click on the “!analyze -v” link that shows up in the log. This will show the stacktrace of the process that was running when the system crashed.

We may also request the .dmp file itself in the future to provide to AMD if the process ends up being the AMD driver.

2 Likes

For what it is worth, I have been having the same problem. I left the laptop FRWK16 running, doing nothing in particular and when I came back to it the whole thing had locked up. On trying to move the mouse and left and right clicking, absolutely nothing, it would not move. Then the laptop powered off completely in a split second. Not a normal shut down in any sense. The laptop then rebooted without being touched, to the login screen.
No error logs of any sort could be found. This happens once a month and I cannot identify any particular circumstance, it seems to be very random. Bios is up to date, all drivers are the very latest ones issued by AMD. I am getting a failed to load HSP firmware message in the windows logs which may indicate a bios issue.

please do not use the latest drivers from AMD. I had a lot of issues with is and a lot of people complain on it. The best stability in my case I got with AMD drivers delivered in driver bundle from framework. there is still issue with sleep, but I’ve disabled it and use only hibernate. When I will have more time I can enable back sleep mode and log output from the EC Debug console

2 Likes

Same here since 24H2 my AMD FW16 is just a brick i have to disable amd audio corpressor and still can’t sleep, hibernate, reboot or power on reliablely …

It’s a total mess …

And there is nothing we can do … because it’s related closely to FW16 + AMD DRIVER + Windows 24H2

1 Like

oh damn - i forgot I also had to disable the amd audio compressor! I had to do that also. Urgh :frowning:

Hi,

I have been updating progress on the issue:

My ideas so far:

  1. I have fixed various bugs in the EC code so that it does a better job of catching port80 codes and not missing them. I have also added a port80 code badbadxx when the EC detects a hardware overrun.
  2. I have added some port80 codes to the linux kernel so that I can tell if some bug in the linux kernel requested the reboot or if it rebooted without the kernel asking it.
  3. during my work doing 1 and 2, I have spotted a performance problem where the EC can actually slow down the CPU. I will investigate that also as I don’t think it is a feature we need.

So, still waiting on my laptop to FTR, but at least more ready to diagnose why.

4 Likes

Thanks for continuing to dig further into this, and thanks for the Github issue filings.