I think if the SSD requiring the machine to explicitly send a “notification”, or else increment “unsafe shutdown” by 1, it’s too sensitive.
If you, hypothetically, only power on the SSD by connecting the VCC and GND, then power off by removing power to VCC, the “unsafe shutdown” will still increase even if you did absolutely nothing in terms of data.
This could explain why as mundane as adjusting keyboard backlight causes it to increase, the backlight LED power and the VCC power to SSD is likely powered by the same rail.
Even with the bug mentioned above, the counter shouldn’t add 400+ within a day. Something is definitely broken either the SSD itself, or the mainboard powering the SSD
Hey there,
I’m having a similar issue here on my FW 13 Ryzen 7 350, about 3 weeks old.
I have two drives with two OSes: One Kioxia G3 1TB internally for my Fedora 42 installation and a 250gb Expansion Card with Windows 11 installed.
I took a look today at crystaldiskinfo in windows and saw that my Kioxia has about 280 power on hours (that tracks) and 4014 power cycles with 3819 unsafe shutdowns (that doesn’t track).
The Framework expansion card running windows has about 80 cycles with 20 unsafe shutdowns which may or may not be accurate. I sometimes leave that plugged in when booted into Fedora but never really mount it, I usually take it out though.
It too appears to me that the issue resides with my Fedora installation but I can’t exactly see what’s causing the problem, just figured I might chime in as another data point. I’d be glad to help troubleshoot this given guidance (I’m fairly well versed with using Linux).
Two other issues I’m having with Fedora that may or may not be related to the drive disconnecting is that sometimes the device may Kernel Panic when left idle (with the lid on mind you, idk if it goes to sleep since I’ve only caught it a few times with it left open after leaving home for a while).
The other is that sometimes when waking from sleep the Goodix fingerprint sensor will no longer work or show up in lsusb, requiring a full reboot for it to show up again.
I’ll be keeping an eye for replies and potential help or ideas to try by myself.
Sorry to hear that, and thanks for providing your case. I’m using btrfs and is occasionally btrfs scrub and subvolume backups for if there’s anything wrong I may recover more quickly.
I’ve opened a ticket and have been contacting with them for a while, and we haven’t had any “ah-ha that’s it” moment and are still taking different experiments. I’m currently experimenting that if setting power-profiles-daemon to different power mode helps (I used to use the KDE GUI to automatically switch to powersave if on battery and balanced otherwise; I’m using always balanced now as an experiment), plus also taking an eye on if generic poweroff/reboot affects the counters, too.
Speaking for myself: dmesg has little to say on this issue, since every time I find smartctl reporting irregular values again is after I wake the laptop from s2idle, during which the user space and the kernel inherently have fewer grasp of the whole picture since well, s2idle. I guess one has to be equipped with decent kernel driver and ACPI knowledge to debug such issues. One thing that’s for certain is that power cycles and unsafe shutdowns seem always come hand in hand with each other.
Some guy from other community had suggest patching DSDT to forcibly enable s3/suspend-to-ram and see if ultimately it’s just yet another case demonstrating all the manufactures out there are not good at implementing s2idle, but I’m not sure if it’s worth it and have not tried that out, since it’s not supported by either AMD or Framework after all.
Same issue here, with btrfs on dm-crypt on luks on nvme. However, since rebooting into the latest 6.15.7 kernel, the issue has not been experienced (Edit: last suspend/wake cycle jumped the count from 51179 to 52000! The issue persists with 6.15.7.). Fingers crossed it stays that way! I racked up an insane number of unsafe shutdowns over the last 2 months:
Folks, we are actively tracking this. However, unless you are very clear on specs, your data is not helping us better understand what you’re experiencing. Distro and kernel are great, but laptop model and specs are needed as well. Thanks.
Which platforms support that, and would that flag cause issues on those that do not? For the 13" I have 11th and 12th gen Intel boards and an AMD 7640, in the 16" I have the AMD 7940, and in the 12" I have the Intel 13th gen i3. I typically boot off of an expansion card that I move between machines, so would not want to set a kernel parameter that caused issues with some platforms. Thanks!
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 39 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1,034,366 [529 GB]
Data Units Written: 6,886,308 [3.52 TB]
Host Read Commands: 11,899,287
Host Write Commands: 136,022,334
Controller Busy Time: 179
Power Cycles: 2,814
Power On Hours: 343
Unsafe Shutdowns: 809
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Power Cycles increments whenever I suspend/resume. Unsafe Shutdowns about 1/3 of the times, I guess? I don’t think I’ve fully rebooted this thing 800 times. Also I’ve never seen signs of disk or fs corruption.
Since my last report, I have not shut down or restart (uptime is 17 days), but “Unsafe Shutdowns” has increased from 809 to 821
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 1,034,437 [529 GB]
Data Units Written: 6,947,973 [3.55 TB]
Host Read Commands: 11,901,443
Host Write Commands: 137,301,470
Controller Busy Time: 181
Power Cycles: 2,856
Power On Hours: 347
Unsafe Shutdowns: 821
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Same thing happening here. Have had laptop for a couple months and smarctl shows 4000 cycles and 3800 unsafes.
Specs:
FW13 Ryzen 7 350
2x16 5600MHz Kingston Fury
1TB Kioxia Exceria G3 nvme (contains Fedora 42 with latest updates)
250GB Framework expansion card (contains Windows 11 in togo mode, works fine and doesn’t seem to cause any power cycle issues)
Ok this is interesting: I removed Linux from my internal ssd and cloned over my windows installation from the ssd expansion card. Took a screenshot of CrystalDiskInfo right after installing it about a month ago and lo and behold, power counts have increased by about 900 (I have NOT rebooted or slept this machine anywhere near 900 times) and Unsafe Shutdowns have gone from 0x1417 to 0x156A (339 times in decimal, so ~11 times a day, again nowhere near any kind of power cycle I do myself). I have upgraded to the latest firmware and no change has been made.
This sounds like a serious firmware issue to me, or Linux and Windows happen to be having a similar problem. Either way, I feel like this needs to be investigated further because I’m not entirely certain cycling the ssd or doing unsafe shutdowns on it is a good thing at all for the drive.
I have now confirmed this happens on at least one more identical framework device with the same kioxia ssd running windows 11. It might just be normal but I’m still not sure this should be happening.
Model Number: Samsung SSD 990 PRO 4TB
Firmware Version: 4B2QJXD7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Before the system suspends:
Power Cycles: 3,157
Power On Hours: 2,090
Unsafe Shutdowns: 2,405
After coming back from suspend:
Power Cycles: 3,159
Power On Hours: 2,090
Unsafe Shutdowns: 2,406
One suspend created 2 power cycles, and 1 unsafe shutdown.
Currently on Gentoo and Linux kernel 6.17.1
The kernel option nvme.noacpi=1 totally killed the suspend. I could not get my system out of suspend, the power button LED stopped flashing, but screen stayed black.