The SSD on my Framework Laptop 16 failed three times in six months. Framework replaced the disk each time, and after the third failure, they replaced the mainboard, the apparent cause of the failures. When I realized I had a recurring SSD issue, I developed an app for tracking NVME SSD health. It’s a Linux Python app, and it’s available here for anyone who’d like to use it.
It consists of two components:
- A systemd service that monitors SSD health using the nvme-client Linux package and writes data periodically to a log file.
- A command -line client that displays a summary of SMART data health info and a histogram of disk temperatures. The client can also run headless, installed as a systemd service, doing background monitoring of the log files and providing configurable email alerts.
There are several moving parts, but I think it will be straightforward to install. It discovers NVME drives automatically, lets you sort and scope the temperature histogram, and tab between separate display pages for each drive. There are configuration parameters to do automatic log pruning and archiving, and you can de-bounce email alerts, receive periodic “healthy” notifications, and also get a notification if the collector service stalls or fails.
I had fun writing it, and hopefully some will find it useful. Here’s a screenshot of the command line client. As witnessed by the media_errors count, this was the beginning of the third (and hopefully final) disk failure. The health-score is a 0-100 roll-up score based on a simple algorithm of weighted SMART data values.
