I’m currently away from home for various reasons and have been using my FW16 as a daily driver. Generally speaking it works great, however about once a day when I attempt to launch a game the system hard-reboots (back to the Framework logo and then into boot device select) without any warning, entry in Event Viewer - which just shows the usual “the last system shutdown was unexpected” and “the system has rebooted without cleanly shutting down first” - or Windows crash dump. Of note, when it does come back up into windows, the fans immediately ramp up quite high, so it feels like an overheating issue - but it happens so early that it feels improbable that it can get that hot that quickly - usually at the same time the game’s window first opens.
I’ve had this happen with both s&box (Garry’s Mod 2, essentially, running on Source 2 Engine) and Tom Clancy’s The Division 2 (running on Ubisoft’s Snowdrop engine). Notably both games are quite graphically intensive. Every time it’s happened (which is 3 or 4 so far, I’ve not been counting) it’s been within half a second of the game window opening.
I have the laptop plugged in the wall via the stock power supply.
Specs-wise I’m on a FW16 with the 64GB stock RAM kit, the Radeon 7700S, the Ryzen 7940HS, and with Windows 11 installed on the WD Black 2280 SSD. I do have a Sabrent 2230 SSD installed but it’s got Linux on it and so Windows doesn’t use it. GPU drivers are on latest currently available (released at the beginning of May) and Windows is up to date. Latest BIOS/UEFI too (03.03). I have an external monitor connected over the DisplayPort module but the game is running on the laptop monitor as it’s higher refresh rate.
Anyone else had a similar experience and/or any advice as to where else I could begin to diagnose?
FWIW I will try those, but I would expect memory or NVME issues to result in a bluescreen 99.9% of the time, not a hard reboot. It’s also specific enough that I’m not sure attempting to repro under Linux is going to prove anything if it doesn’t - though will be interesting if it does.
One other thing to try is reseat the dGPU. I would take the interposer off, gently and carefully clean the pads with a Q-Tip and 99% ISO, and check for any malformed pins on the interposer itself.
Check if the thermal pads are making contact to your SSD. There is one at the bottom for the 2230 drive, and one on the mid-plate for the 2280 drive. This also will give you a chance to reseat the NVME drives.
As @Jarad_kidd stated, perform a memory test (give it a good amount of time - maybe a few hours, overnight if possible), check NVME health, and worst case, CPU/GPU burn in tests. Furmark seem to be the go-to for GPU stress testing. For CPU, I’d just use it to mine some Monero, or use CPU-Z, HWInfo64, or old-school Prime95 (a good mix is a good idea because each tool stresses something different on the CPU - learned this in my heavy OCing days).
Good call on reseating dGPU. I got DIY edition but it was pre-installed and I haven’t yet needed to take it out, so I’ll give that a shot. But same as my previous reply w.r.t. possible CPU stability issues - in the vast majority of cases where I’ve had hardware issues with CPU/RAM/NVME on PCs I’ve built, a hard crash and reboot wouldn’t be what I’d expect to see.
From my experience, weird behaviors tend to need a game plan on troubleshooting. From your post, I’d probably focus on the GPU itself first - and being a modular laptop, first thing I’d focus on is the connection between mobo and GPU. You can also do a visual inspection (check if any SMT components got knocked off). Also keep an eye on temps during the stress tests.
Then I’d move on to focusing on the CPU and memory (they are tied together), so stress testing CPU/memory, monitoring temps, etc.
I don’t usually suspect the storage since if I can reproduce the issue from a Live USB as that eliminates storage as the issue. But if you haven’t already done that, I wouldn’t remove NVMEs from the list of suspects.
That’s the other thing too - if you can check if it happens on another OS - whether it’s another fresh install of Windows (which is a better 1 to 1 test), or a different OS (not 1 to 1, but still another data point).
It’s all about getting as much data points as you can to sus out any patterns.
After a bit more digging, it appears the issue is that the CPU likely gets too hot - the temperature reaches the high 90s (celsius) even when it doesn’t crash (98.6 was what I saw), and the fans take a good 45 seconds to ramp up to their maximum, even when the CPU is above 90 degrees for that entire period. Not sure why they behave in this way with the gradual ramp-up instead of just immediately jumping to maximum?
I have the same issue i think and what I noticed it happens most times only after the laptop was either suspended or hibernate before. And also the crash happens both using the apu or the gpu.