Hey there!
I couldn’t read through the entire thread, but the issue you’re having looks similar to the one that is plaguing Lenovo Yoga laptops with AMD Ryzen 6000 mobile CPUs. On my unit (Ryzen 6800U/Radeon 680M) I get a BSOD after the screen freezes for what minutes (usually 5 to 10 minutes). And the error is usually a DPC_WATCHDOG_VIOLATION somehow related to the amdi2c.sys driver.
We had similar threads in the Lenovo community forum (here and here) but no one of us found a solution to this. Lenovo tech support also couldn’t debug it as they say it’s not reproducible on their units (and is also randomly happening). Long story short, after almost 2 years of use our laptops are still bugged and randomly freezing.
I look forward to see if you guys can find any hint about this. Hopefully it’s just some buggy BIOS setting that can be tuned with Smokeless’ UMAF.
I know that the laptop is different from yours, but I am available in performing any test that comes up in this thread.
Cheers
These freezes are beginning to look consistent across platforms.
My FW13 has an Intel i5 Core and runs OpenSuSE, but FW user’s reports on this website suggest freezes are far more frequent on Ryzen systems. At a sheer guess, I’d say they only occur on my laptop around once a week on average.
Until recently I’d assumed the system was always permanently frozen. However a few days ago there was a ~5-15 minutes delay before I began attending to the usual ritual of forcing a power-down and rebooting, but by then it had apparently unfrozen and was running normally.
Very ccasionally the system seems to have episodes where simple actions like responding to a mouse click are randomly delayed by seconds.
Maybe clicking the trackpad brings it all back to life, as I wondered above? I’d like to disable the trackpad completely as I don’t use it (I use a wired mouse / USB hub) and that would simplify things.
I’ll have a look at the journal next time it occurs.
I couldn’t find any watchdog violations in the Linux system journal, however I’m not sure this is the whole story. Maybe freezing pre-empts a journal entry, and the diagnostics are likely to be different anyway.
Double-check your drivers and remove unnecessary peripherals to fix the DPC Watchdog violation error.
The DPC Watchdog error can be caused by outdated or incorrectly installed drivers. Software conflicts are less common culprits.
Solve the DPC Watchdog violation error in Windows 10 and Windows 11 by checking IDE ATA/ATAPI controllers, removing external devices, updating SSD drivers, and scanning system files for errors. If all else fails, try a system restore.
All this is consistent with the idea it’s an adapter problem of some sort, and with the reported facts. In particular (in my case):
the possibility that clicking the unused trackpad may pull it out of the frozen state;
the episode of erratic responses mentioned earlier, since I’d just changed the adapters around (for convenience sake) without rebooting the system and I think it hasn’t occurred since;
Hi I have the 7040 13 inch laptop and are getting hard freezes from time to time what amd graphics do you recommend I download
Is there some amd auto downloader?
do I need to uninstall the previous driver that came with the OS
My FW13 i5 had several freezes on Saturday morning which provided an opportunity to make some systematic observations of the problem, with these results.
As mentioned previously, freezes tend to occur in episodes separated by problem-free periods. On Saturday mid-morning random screen responses became delayed by sub-second to maybe 2-sec “freezes” and the system eventually froze completely after a minute or so, recovered after ~15 minutes, and froze again.
That has all the characteristics of electrical noise, maybe causing spurious interrupts.
The system always does recover as M_R stated. However only the integrated monitor was unresponsive. The time-of-day clock apparently continued to be updated correctly (it showed the correct time when the screen came back to life after 15 mins), the screen darkened and increased brightness when the outboard mouse or the trackpad was moved during this time, and the spreadsheet in use beforehand continued to run normally afterwards. Nudging the mouse during a freeze moved the cursor when the monitor recovered.
The FW13 had an unused HDMI adapter, so the empty HDMI cable socket was electrically floating. Taking a hint from the How-To-Geek article linked above, I removed it and rebooted.
So far, two full days later, no problems. If it runs for a month without problems I’ll declare problem solved.
But the floating adapter connector doesn’t seem like a good idea, either electrically or since it’s exposed to dust on desk surfaces, etc. Is it possible to buy a dummy cover?
Yes, that’s definitely on the list and it has certainly worked for some users. However the highly episodic nature of the problem suggests to me there’s an underlying cause which the updated firmware may handle better.
I had a single freeze last night after 4 1/2 days, so I’ve reinstated the HDMI adapter and will update the BIOS ASAP.
But it would be nice to know why updating the BIOS seems to fix the problem rather than just doing it and hoping for the best! This still doesn’t feel to me like a common-or-garden program bug because it’s too randomly episodic.
The 3.05 update notes four fixes, only one of which (the thermal issue with Linux) could possibly be environmental or hardware related and it’s winter here. However the retimer update might do the trick: it could account for the fact it’s only the integrated display which freezes, and freezes evidently happen much more frequently on faster Ryzen FWs where I guess signal timing is more critical.
Two questions though…
The update notes show two links to the same Linux BIOS 3.05 update but one carries the comment “You must be running 3.05 or later to apply this update using EFI.” which is obviously curcular. What is really meant?
And am I correct in thinking that the Linux BIOS update includes the shell? Or is the shell provided by the O/S like the drivers?
What’s the general stability of the 3.05 bios? I’m still running 3.03b, and the experience of previous bluescreens has made me reluctant to update since the system is stable now.
Is anyone else still having these issues? I continue to get the lagging cursor to eventually fully hard locking up. It has blue screened in the past with the DPC WATCHDOG VIOLATION error but lately the freezes are simply hardlocks without it ever moving to a BSOD.
Memtest86 passes, reinstalling windows, drivers, swapping RAM around, with and without expansion cards, etc. Nothing seems to solve it.
I’m still discussing this with support. It’s frustrating how unreliable the laptop has been.
I recently had a serious event. I was working as usual with the laptop hooked up to an external monitor, bluetooth keyboard and mouse attached through the external monitor’s usb hub. I had a dozen or so PDFs open, three or four firefox tabs open, Word open. Computer hard froze. It happens, although nothing now for many months. I hadn’t updated the AMD drivers to the latest set released earlier this month, but had the April, 2024, BIOS 3.05 update installed.
So I pressed the power button to restart and the Windows recovery environment opened up. I then proceeded for the next few hours to attempt the usual Windows 11 recovery steps and my Bitlocker recovery code, but to no avail. In truth, I had turned off the restore points to conserve drive space.
I then went to the reset stage, using a USB key created on my desktop computer. But a reset saving my personal files was not even possible. It would attempt to reset, but get to 1% of the job and then stop, indicate that the changes were being undone, and I would be returned to the recovery environment. I worked with a friend who is an IT person at Microsoft and we weren’t able to resolve why this reset option wasn’t available.
So I had to do a clean install of Windows. Not the end of the world, as I have a NAS backup, but of course worrisome.
I’ve since turned on the restore point for the laptop, and everything is working well since the fresh install.
Hello, a lot of BSOD here (CLOCK_WATCHDOG_TIMEOUT and others), since the end-september/early-october Windows 11 updates.
I keep a precious save that was ok, in the case the next updates won’t fix the issue…
Under NixOS Linux 6.11.5 (AMD 7840U) and I’ve had two issues, had a hard freeze and checked dmesg saw this :
[37693.948431] clocksource: timekeeping watchdog on CPU8: Marking clocksource 'tsc' as unstable because the skew is too large:
[37693.948457] clocksource: 'hpet' wd_nsec: 503361740 wd_now: 5bbcd559 wd_last: 5b4edc21 mask: ffffffff
[37693.948466] clocksource: 'tsc' cs_nsec: 503904732 cs_now: 70f2521498fa cs_last: 70f1ef26ae21 mask: ffffffffffffffff
[37693.948472] clocksource: Clocksource 'tsc' skewed 542992 ns (0 ms) over watchdog 'hpet' interval of 503361740 ns (503 ms)
[37693.948479] clocksource: 'tsc' is current clocksource.
[37693.948536] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[37693.948842] clocksource: Checking clocksource tsc synchronization from CPU 8 to CPUs 0,6,9,12-15.
I’ve also encountered an amdgpu bug I think under 6.11.3 which made the laptop extremely slow :
I tried to suspend the laptop and resume but this did not fix the issue under 6.11.3 but I saw these additional lines
[385554.065285] PM: suspend entry (s2idle)
[385554.072972] Filesystems sync: 0.007 seconds
[385554.098957] Freezing user space processes
[385554.101760] Freezing user space processes completed (elapsed 0.002 seconds)
[385554.101765] OOM killer disabled.
[385554.101766] Freezing remaining freezable tasks
[385554.102918] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[385554.102924] printk: Suspending console(s) (use no_console_suspend to debug)
[385555.144924] queueing ieee80211 work while going to suspend
[385557.028247] ACPI: EC: interrupt blocked
[385596.855897] amd_pmc AMDI0009:00: Last suspend didn't reach deepest state
[385596.927936] ACPI: EC: interrupt unblocked
[385596.979090] clocksource: timekeeping watchdog on CPU11: hpet wd-wd read-back delay of 260019ns
[385596.979100] clocksource: wd-tsc-wd read-back delay of 3732876ns, clock-skew test skipped!
[385597.129649] [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
[385597.129847] amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
[385597.133052] amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
[385597.140000] nvme nvme0: D3 entry latency set to 10 seconds
[385597.143115] nvme nvme0: 16/0/0 default/read/poll queues
[385599.763083] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(5) dpia(0) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385600.026274] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(5) dpia(0) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385600.289362] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(6) dpia(1) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385600.551335] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(6) dpia(1) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385600.814635] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(7) dpia(2) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385601.077735] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(7) dpia(2) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385601.341270] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(8) dpia(3) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385601.604714] amdgpu 0000:c1:00.0: [drm] *ERROR* dpia_query_hpd_status: for link(8) dpia(3) failed with status(0), current_hpd_status(0) new_hpd_status(0)
[385608.993905] amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[385608.993914] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[385608.993919] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[385608.993922] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[385608.993926] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[385608.993929] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[385608.993932] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[385608.993935] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[385608.993939] amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[385608.993942] amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[385608.993945] amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[385608.993949] amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[385608.993952] amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
[385609.269013] [drm] ring gfx_32788.1.1 was added
[385609.269649] [drm] ring compute_32788.2.2 was added
[385609.270274] [drm] ring sdma_32788.3.3 was added
[385609.270302] [drm] ring gfx_32788.1.1 ib test pass
[385609.270330] [drm] ring compute_32788.2.2 ib test pass
[385609.270435] [drm] ring sdma_32788.3.3 ib test pass
[385609.803325] OOM killer enabled.
[385609.803327] Restarting tasks ... done.
[385609.807725] random: crng reseeded on system resumption
[385610.410274] PM: suspend exit
I wasn’t able to reproduce the issue on 6.11.5 yet.
Depending on the apps I use, I’m having daily complete lockups on Windows 11 with latest BIOS and drivers when using some graphics intensive apps including Microsoft Powerpoint, Adobe Photoshop. The lockups mostly if not always seem to happen when doing sudden intensive tasks but not during sustained heavy tasks, specifically saving a (large) file or copy pasting (large pictures) seem to trigger a lockup.
System freezes completely without any chance to recover, display stays on. After a minute or so, the system will automatically reboot. The system event log doesn’t contain any entries and there is no BSOD to be seen.
I’ve already removed the HDMI extension and disabled the PCIe idle setting in the BIOS to make sure this is not related to either of those.