1TB expansion card disconnects randomly

New here. Batch6 with Intel i7-1165G7. My 1tb expansion card somewhat randomly disconnects and reconnects, I think while doing sustained writes. Tested on linux.

Haven’t diagnosed it properly yet. So I’ll provide some probably useless info for now.

While properly locked in place, the filesystems on the device will suddenly become readonly. Just before this occurs, dmesg shows a usb disconnect event, the fs failing to write a few times and becoming readonly, the usb connect event, and the reconnected block device getting assigned a new letter (sda->sdb->sdc) since the old one is still sort of in use.

So far has only happened while actively writing to the card such as when running emerge -NavDu @world or btrfs receive, and has only happened while in the framework laptop itself (have tried the bottom left and right bay). I have not yet had issues when plugged into my desktop, but it has not spent much time in there yet. It usualy takes several minutes of active writes before it disconnects, taking 15min+, but have had it disconnect once shortly after boot for unknown reasons.

Tested using it from a fresh kali linux install on a seperate flash drive, from gentoo installed on the 1TB ssd itself, and from a fresh arch linux install on my nvme.

My btrfs receive was over ssh and with the network speed around ~160Mbps. The card dis/re-connected at 13Gb within 20min.

Anyone else with similar issues? Any recs for commands I should run for monitoring and testing? Assuming/hoping its a bad flash card for now, but it did happily let my desktop write 200Gb to it (in a much shorter <5 min timespan), and let me compile/install gentoo to it from the desktop over the course of an hour. Its not hot to the touch, but if its a thermal issue what should I do to check (I don’t think lm-sensors shows anything for the card by default)?

relevant dmesg: Framework 1TB expanson card random disconnect - Pastebin.com

2 Likes

@Shy_Guy see if this

or this

helps.

Good luck!

1 Like

@lbkNhubert Thanks, I tested both sides after reading the first one, but still havn’t tested either of the top two. The second one I rulled out since it was never hot to the touch and my write speeds inside the laptop have been limited by my wifi (no faster than 160Mbps, nowhere near 1000MBps) since all write operations were from downloading files.
I did attempt to measure the temperature via lm-sensors, but its not detecting anything specific for the card. Granted, I did not actually run sensor-detect on the thunderbolt ports (can anyone confirm its safe to probe). But seeing how low my write speeds are I doubt its overheating (plus it has worked fine at full speed in dessktop)

Guess I should test the top ports too. On one hand I don’t want to needlessly wear down the card, on the other its likely either the card or mobo is faulty and I need to stop being a puss and determine which. I’ll script something up soon

I’m having the same issue on Windows with the 256GB card. Were you able to find a solution yet?

The problem I am having is not likely one with a solution other than a replacement. Its just a matter of finding the time to positively identify what needs to be replaced.
Or, in my case, waiting for the other expansion card to ship in so I can hopefully rule out the mobo.

If it the new card has the same issue, I will need to check if thumb drives/phone/etc also disconnect randomly/briefly to better understand the scope of the issue. Fact is, so far the only thing I have plugged in is this expansion card so its possible the usb ports just reset or something every few minutes for some unknown reason. I think I saw a post with such an issue a few days ago, but failed to save it.

I don’t actually have anything else to plug in atm other than a display for testing. I haven’t seen it flicker during use so I don’t think its a general error with the mobo.

I usually have a USB headset or a dock plugged in and i haven’t noticed any issues with those, so it could very likely be the expansion card in my case . Please update and let us know.

Having a similar issue with the 250GB expansion card which has Linux Mint installed. It has no warning and will typically happen after a command completes successfully. I’ll be unable to open or run any programs, drives will display i/o errors, and all icons disappear as they are removed from memory. The only way out is a forced shutdown. Typically I have no other devices connected aside from my internal NVME when this happens.

FYI: I just noticed the expansion card got disconnected when I rebooted & plugged in a docking station. Can anyone try to reproduce?

Yep I just reproduced it again and it disconnected right after I plugged in a docking station.
FYI: This is the docking station Im using:

https://www.amazon.com/-/es/Thunderbolt-adaptador-estación-acoplamiento-universal/dp/B07WNSP368/

Noticed the same thing with NixOS. I haven’t tried other ports yet.

Note in my case the issue manifests as a USB dis-/re-connect, as shown in the dmesg.

I actually haven’t experienced the issue again recently, but I haven’t done any long slow writes to it since then

Is yours also plugged directly into the laptop, and is is mounted/in use when it disconnects?

Update and possible good news!

I just recently got around to testing my new card. Definitely not a temp issue, also didn’t seem like it particularly cares if/how much your writing to the disk. Stress testing didn’t make it consistently fail faster. Only seems to fail eventually when writing, regardless of how little is written (just status log upd8s from bg processes are enough)

I was thinking it was a faulty mobo… but as I researched linux/usb disconnect problems in general and btrfs unsafe eject/re-insert issues, a few people seemed to suggest instability was common with some usb controlles with linux, a few suggesting common power management problems with usb controllers putting devices in use to sleep erroneously.

Potential workaround/source of issue

Seeing how power management issues was mentioned, I checked the bios and changed my settings from max battery to non-turbo performance.

bios -> Advanced -> Boot performance mode = Max Non-Turbo Performance

Didn’t expect anything out of it since it looked like a cpu setting to me, but I have gone a full day with no erroneous disconnects. Haven’t rebooted again yet, but seeing I made it this long when I usually cant get more than a few hours, I’m hopeful.

If your are also experiencing random usb disconnect issues, check if your bios is set to boot to max battery and switch it to performance or turbo if so. Let me know if that had any affect.

Potential STR

In a similar note, if you are bored and don’t have the issue but would like to help confirm STR (Steps To Reproduce), try the following

bios -> Advanced -> Boot performance mode = Max Battery

and see if your suddenly affected (issue usually occurs for me sometime between 5 sec and 3hrs after I start using the 1tb ssd). Note the issue only makes itself self-apparent if your writing to the device (since the fs goes ro). Also note that though my issue was noticed with the ssd expansion cards, it could still be a more generic usb problem affecting other devices.

If you want to passivly monitor if the issue has been triggered, you can watch dmesg for unexpected disconnect events via
dmesg --follow | grep -iE 'USB disconnect'

Finally, note I have only gone 24hrs without the issue, during low usage, and in a single boot cycle. It could just be a coincidence. I was grasping at straws for ways to conclude it might not be a faulty mobo so I wouldn’t have to send it back for a replacement. Even if this does work around my problem long term, its possible that not working in max battery mode is because I have a faulty mobo rather than some linux usb controller power management driver issues.

Day 2, power cycle 2. Still no issues using the same low load as before. Tomorrow I will try booting into my gentoo install on the 1tb ssd which will have a full load of the os (still kinda light usage, but a lot more than now).

I think this bios setting change is working.

Day3. No dice. Disconnected itself randomly before I got a chance to umount day 2s test.

With the bios setting change reliability is improved drastically, allowing me to make it through entire days instead of just a couple hours. However, it still eventually fails.

The combination of the fact this bios setting change improved its reliability, the fact it still eventually fails, in addition to everything I already know from previous tests, and the fact some people are running persistent linux on ssd card without issues, really pins the issue down to my mainboard. I think increased voltage or something of performance mode is helping to mask an underlying faulty hw issue

Thank you @Shy_Guy for sharing all of yours tests, I really appreciate and I hope the problem will be solved before I receive mine.

With my laptop, I have the habit to not use the internal hard drive, and to use en external USB SSD key with my GNU/Fedora system fully encrypted (and another key to make a backup). This permit me to always have my key with me, with the possibility to run my system on many other computers/laptops if needed.

As you guessed, I’ve want the same with the Framework, I’ve not ordered internal drive, but 2x 256GB expansion cards. I hope your problem will be quickly fixed, because this will seriously affect my Framework usage too.

For folks who are seeing this occur, could you share:

  1. The BIOS version you are on.
  2. The OS you are using (if on Linux, also the kernel version)
  3. What Expansion Cards and other peripherals you have plugged into each Expansion Card bay.
1 Like
  1. 3.07
  2. Windows 11
  3. 256 GB. I usually have a USB headset connected or a docking station. But it still happens when they’re disconnected.
  1. BIOS 3.06 (I will be updating soon to see if it changes)
  2. Linux Mint 20.3 5.14, also occurred on 20.2 5.12 and 20.2 5.14. Originally I would notice because my OS would stop working (I booted from my expansion card at that time), but I’ve since migrated the OS to my internal M.2 and still experience the disconnects (my auto-mount fstab rules to mount partitions at /media/docs and /media/gamedisk will suddenly be inaccessible as it disconnected and remounted at /media/b/docs and /media/b/gamedisk). I have not used the laptop undocked in Windows 10 enough to confirm whether this occurs there as well.
    EDIT: This has occurred in Windows 10, my D: and E: drives for my two external partitions on the 256GB disconnected and reconnected mid-notetaking (no data loss because they remounted in the same spots before I went to save, thanks windows!), I had no other devices plugged in except 3 USB-C cards.
  3. Just the storage expansion and three USB-C cards, no peripherals

Thanks for information @john_doe @Be_Far.
May I know if you connect AC adaptor next to SSD expansion card when issue happened? Which means the SSD and AC is attached in the same side of laptop.

This has happened both with charger connected and disconnected, but at the times when the charger was connected, it was on the same side as the expansion card.