`smartctl` shows excessive power cycles and unsafe shutdowns

  • Which Linux distro are you using?
    Archlinux

  • Which release version?
    (if rolling release without a release version, skip this question)
    (If rolling release, last date updated?)
    Freshly installed

  • Which kernel are you using?
    uname -a
    Linux frmw13 6.15.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 19 Jun 2025 14:41:19 +0000 x86_64 GNU/Linux

  • Which BIOS version are you using?
    dmidecode -s bios-version
    03.03

  • Which Framework Laptop 13 model are you using? (AMD Ryzen™ AI 300 Series, AMD Ryzen™ 7040 Series, Intel® Core™ Ultra Series 1, 13th Gen Intel® Core™ , 12th Gen Intel® Core™, 11th Gen Intel® Core™)
    AMD Ryzen™ AI 300 Series

SSD is “SHPP51-2000GM”, SK Hynix Platinum P51, firmware version 61060A50. I’m experiencing exceedingly high power cycles/unsafe shutdowns during s2idle: slept for like 2 whole days, and the count sky-rocketed to 436 power cycles and 416 unsafe shutdowns. I’m pretty sure they were both zero in the beginning: smartctl is one of the first commands I issue when I’m playing with new hardware.

At first I thought it was probably the bios AMD PSPP playing badly with the SSD, so I tried turning it off: it was on when I discovered the issue, probably since I messed around the bios when I got my hand on the machine before installing the OS, thus the following observation is when it’s off i.e. no downgrade, only to no avail: I put the laptop into sleep again for like 4 more hours, and power cycle incremented by 3, unsafe shutdown incremented by 4.

Kernel log doesn’t seem to suggest anything too interesting:
journalctl -o short-precise -k -b -1 | grep -i nvme

Jun 22 14:23:10.667075 archlinux kernel: nvme 0000:bf:00.0: platform quirk: setting simple suspend
Jun 22 14:23:10.667284 archlinux kernel: nvme nvme0: pci function 0000:bf:00.0
Jun 22 14:23:10.691053 archlinux kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 14:23:10.693042 archlinux kernel:  nvme0n1: p1 p2
Jun 22 14:23:18.477043 frmw13 kernel: nvme nvme0: using unchecked data buffer
Jun 22 14:23:18.479044 frmw13 kernel: block nvme0n1: No UUID available providing old NGUID
Jun 22 14:51:36.188976 frmw13 kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 14:52:47.632912 frmw13 kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 15:28:25.492790 frmw13 kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 17:19:08.510854 frmw13 kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 19:04:35.519197 frmw13 kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 22:55:30.540449 frmw13 kernel: nvme nvme0: 16/0/0 default/read/poll queues

I’ve read that some SSDs has trouble waking up after entering certain power states as suggest in Arch Wiki; not that the symptoms are the same, since no system freezes besides smartctl reporting concerning numbers, nevertheless I decided to give it a try: maybe it’s since the deepest power state of SK Hynix P51 is somewhat buggy on AMD motherboards? So just before posting here, I rebooted with new kernel parameter. Interestingly rebooting the computer does not lead to changes of these two fields, nor does it change when I’m actually using the laptop.
journalctl -o short-precise -k -b 0 | grep -i nvme

Jun 22 23:11:06.635519 archlinux kernel: Command line: ro root=/dev/SkHynixPlatinumP51/data add_efi_memmap rootflags=rw,relatime,ssd,subvol=/@root quite loglevel=3 nmi_watchdog=0 nvme_core.default_ps_max_latency_us=3200 initrd=initramfs-linux.img
Jun 22 23:11:06.636472 archlinux kernel: Kernel command line: ro root=/dev/SkHynixPlatinumP51/data add_efi_memmap rootflags=rw,relatime,ssd,subvol=/@root quite loglevel=3 nmi_watchdog=0 nvme_core.default_ps_max_latency_us=3200 initrd=initramfs-linux.img
Jun 22 23:11:06.701075 archlinux kernel: nvme 0000:bf:00.0: platform quirk: setting simple suspend
Jun 22 23:11:06.701300 archlinux kernel: nvme nvme0: pci function 0000:bf:00.0
Jun 22 23:11:06.726032 archlinux kernel: nvme nvme0: 16/0/0 default/read/poll queues
Jun 22 23:11:06.728015 archlinux kernel:  nvme0n1: p1 p2
Jun 22 23:12:28.976020 frmw13 kernel: nvme nvme0: using unchecked data buffer
Jun 22 23:12:28.977009 frmw13 kernel: block nvme0n1: No UUID available providing old NGUID
cat /sys/module/nvme_core/parameters/default_ps_max_latency_us
3200

The 3200 comes from the smartctl power state Ex_Lat (I gave it some margins to try to ensure it can reach power state 3, 0.05w seemed good enough if it mitigates the issue, but I haven’t really have time to dig into the kernel source to see if this settings is sane):

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    10.50W       -        -    0  0  0  0      305     675
 1 +   6.5000W       -        -    1  1  1  1      330     700
 2 +   2.0000W       -        -    2  2  2  2      400     870
 3 -   0.0500W       -        -    3  3  3  3     2000    3000
 4 -   0.0035W       -        -    4  4  4  4     2000   12000

I’m putting the laptop to sleep right now and will update the results tomorrow, hoping this would be a valid workaround. Again, this experiment is conducted when bios AMD PCIe automatic downgrade to 3.0 already turned off.

If anyone knows how to circumvent the issue, please share your knowledge. I really hope my laptop doesn’t just cook my shiny PCIe 5.0 SSD just by s2idle for prolonged amount of time.

1 Like

The previous user with such error messages had the SSD not inserted properly into the socket, which caused random disconnects.

Ok so this time laptop s2idle for 7 hours, power cycle and unsafe shutdown went up by resp. 3 and 2.

I’ll try re-seating it first then report later today.

~ 20 month old FW13 AMD 7040 here, seeing the same thing under Fedora:

Model Number:                       Samsung SSD 990 PRO 2TB
...
Power Cycles:                       4,795
Power On Hours:                     1,517
Unsafe Shutdowns:                   2,005

No obvious issues like hangs or filesystem/drive errors in logs.

Thx for the hint. This morning I removed nvme_core.default_ps_max_latency_us parameter since it’s not like it’s doing any good, shutdown and disconnect the battery via BIOS, re-seated the SSD, and while I was on it I also gently wiped the M.2 slot connectors and the SSD connectors with some cutton swab soaked in some contact cleaner beforehand, boot the machine, put it to sleep, after which carried it commuting as usual. No more unsafe shutdown increments after like 3 to 4 hours of s2idle.

So yeah it’s probably due to poor contact, at least as suggested by the experiments so far. The reason why the number sky-rocketed last week was probably due to the fact I commute with the laptop, which introduced more vibrations. I’ll update the post if the problem persists.

So folks, maybe try re-seat your SSD first if you’re seeing unusual amount of unsafe shutdowns in smartctl.

1 Like

So it turns out unsafe shutdown just incremented by 3 just after the laptop slept during lunch break… :frowning: I’m at a loss again.

This SSD is known to have false positive unsafe shutdown, unlike Western digital. I have a P41 SSD and I can reliability increase the unsafe shutdown counter by adjusting the keyboard backlight during s2idle

I have a Framework 13 with AMD 7840u, and a WD Black SN770 1TB NVMe SSD, and I also see surprisingly high “unsafe shutdowns” and “powercycles”, and also low “power on hours”, for a laptop that I’ve used daily for about 20 months. And really no filesystem corruption issues in that time (and only 3-ish lockups due to amdgpu or mediatek-wifi driver issues I think).

Data Units Read:                    1,003,980 [514 GB]
Data Units Written:                 6,466,185 [3.31 TB]
Host Read Commands:                 11,465,180
Host Write Commands:                128,004,020
Controller Busy Time:               168
Power Cycles:                       2,617
Power On Hours:                     320
Unsafe Shutdowns:                   739
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

It really seems like this is just how the power management works on this system, perhaps due to PCIe ASPM? Probably the OS is confident that all writes are flushed and written, before the “unsafe” shutdown of the device? (and maybe specific to how linux manages power of NVMe storage …)

1 Like

Pardon, but may I ask for further sources on this?
I’ve seen you mentioned this fact in this thread, but that’s for Platinum P41, whereas I’m currently using P51, and AFAIK they’re using different controller and firmware.

Sorry but I don’t have further sources this is only from my experience

Hi @Not_A_Name,

This is my thinking as well. Please go ahead and open a support ticket, ask to have this escalated to the Linux support team. They will ask for logs reflecting the time period for the issue.

1 Like

Thx, ticket issued.

Just for the record: I’ve not yet lost hope in believing this turns out to be a hardware issue, since it would be a quick fix, without upstream merging new driver.

In particular, just like shown in official guide and iFixit, the only reason M.2 drive is “locked in place” is the screw. Quote unquote, since with this particular motherboard and SSD combo of mine, i.e. AI 7 350 and Platinum P51, even when the screw fully screwed in, there do exist some slackness s.t. the drive may wobble a bit, which might just ultimately be the cause.

So I did an experiment last night: cut a thin rectangular strip of thin cardboard/thick paper, drilled a hole in it, and placed it between the screw and the SSD s.t. they are in a sandwich config, after which screwed it down as shown below, sort of soft washer, via which hopefully providing enough friction s.t. the drive could not possibly move in day to day use:

Last night after I put the paper washer in place and boot the machine, the unsafe shutdowns were 428. After a day worth of commute and light browsing/coding sessions with s2idle in between, the counter did not move: it’s still 428 now.

I’ll report later next week. Hopefully this would solve it, for it’s a rather simple and cheap fix: maybe it’s just because this drive being a tinny bit on the shorter/slimmer side?

Just did an experimentation. I
$ sudo nvme smart-log /dev/nvme0 to bring up SMART on my SSD, it showed 2599 power cycle and 405 unsafe shutdown. Then I put my FL13 to suspend, waited for 10 seconds, adjusted keyboard backlight by Fn+spacebar, then 10 sec, Fn+spacebar, then 10s, then Fn+space. Then I continued from suspend and $ sudo nvme smart-log /dev/nvme0 again. Now it shows 2603 power cycle and 408 unsafe shutdown

EDIT: typo

Confused, what’s “suspend” here? It can’t be s2idle sleep if you’re adjusting brightness, right?

I think there is something funny with the power management going on. Here are my numbers, this is for a framework 13 less than 3 weeks old:

Data Units Read: 671,891 [344 GB]
Data Units Written: 2,062,944 [1.05 TB]
Host Read Commands: 27,234,227
Host Write Commands: 21,376,627
Controller Busy Time: 46
Power Cycles: 3,865
Power On Hours: 87
Unsafe Shutdowns: 3,778
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

framework 13, Ryzen AI 7 350, 2 TB Samsung 990 EVO Plus

Gentoo Linux, kernel 6.15.2

No, I am not power cycling my laptop 200 times per day :slight_smile:

1 Like

EDIT: Typo

In s2idle if you adjust keyboard brightness the backlight will light up briefly before turning itself off.

Update: same unsafe_shutdowns increment on Fedora 42

I can reproduce this on a ~2 years old P41.

Before:

  • Power on hours: 3563
  • Unsafe shutdowns: 1492

After:

  • Power on hours: 3565
  • Unsafe shutdowns: 1495

Update:
so the unsafe shutdowns was stable at 428 until this morning I opened the laptop to work and it’s 431 now, so yeah it seems like there’s firmware issue under the hood :frowning:

I don’t know what to say now…

The unsafe shutdowns was still 431 this afternoon, now right before bedtime as I was surveying something and decided to take a look on the counter it’s now 869…

:frowning: :frowning: :frowning:

Ok, this is it, my BTRFS just bricked, and I cannot even mount it under Live ISO.