Another way for people to help diagnose this problem is for them to purchase or make an EC CCD (Closed Case Debug).
For example, I have one of the ones from here:
It works on both FW13 and FW16.
It permits console port access to the EC.
For example, I have modified my Linux kernel to output port80 codes if it does a kernel panic.
After the Freeze, and a reboot, I can then go into the EC and dump the history of port80 codes. So I can see if it did a kernel panic or froze without doing a panic.
Thus narrowing down the problem is little.
So, if more people had EC CCD, it might find the root cause quicker.
It is interesting that @Yam found that no sysrq keys worked.
For example, if you press the sysrq reboot key, and it does not reboot, it implies that it is not in a normal kernel panic handler at the time of the freeze.
Ok, we don’t know what it is, but it is helping us discount lots of things it might be.
@jared_kidd I’m sorry, but how would that help when the LTS kernel also has this issue ? I’m really thinking this isn’t just a kernel issue at this point.
Thanks, @James3 ! That looks very interesting, and I’d definitely be down to get one and try to debug framework issues. I’m just not sure how to use it.
I can also confirm sysrq not working, while those key are working normally otherwise. I suspect it might be a CPU freeze due to power state change. I think I have never seen it frozen while on performance mode.
One of the solutions mentioned is to set /sys/class/drm/card1/device/power_dpm_force_performance_level to high instead of auto, which seems to definitely fix the problem for me right now. I’m wondering if this is related to (at least) one of the crashes experienced.
I don’t have those visual glitches (other than audio glitches). Thanks for the links. One thing to note: I’m on dual boot and I’ve never seen these freezes on Windows, though I do use Linux more when I’m on Windows I’m almost always plugged in.
I haven’t experienced a single crash in a full month of intensive usage, until today, when it randomly froze again. Had to manually shut it down, then like 20 minutes later it froze again but restarted on its own.
I can’t say if I was just lucky or if something made it not crash during this month, but it’s definitely a big mystery still. Worth noting that I have many visual glitches happening still, but only for a few seconds in a whole day.
Although the freezes are less frequent than before, I noticed something interesting.
Whenever the computer freezes completely (and doesn’t auto-restart), the next time I reboot, I’m almost certain to experience another freeze in the upcoming hours (if not minutes), but the computer will auto-restart this time. Thereafter, it comes back to normal, and it’s up to good luck whether I get another crash or not.
I’m here to drop another data point, I’m on a complete different OS (Nitrux) and Kernel (6.11.5-1-liquorix-amd64), I am using Wayland and I have the GPU. I’m experiencing random freezes occasionally but it never reboots on its own, today was the first time it froze more than once