[RESPONDED] 1TB expansion card disconnects randomly

After weeks of not really having an issue, I just had what I might call an extreme version of it.

Specifically, I went to reboot the machine, and suddenly, the expansion drive was not visible even to the BIOS!!!

As far as I can tell the only answer to “how did I fix it” was “voodoo”. I threatened it with a dead chicken and the next time it rebooted, the drive was visible again.

This is the first time I can think of, though, where the drive refused to connect right from the start.

This problem certainly reaches the level of mysticism for me. For instance, here’s a setup I’ve found where the drive never disconnects (at least for the last month of connecting it this way):

However, when I try a shorter USB-C cable, it disconnects all of the time. In fact, when it disconnected yesterday, it wouldn’t automatically reconnect. Instead there were the following errors in the Event Log and Device Manager, and I had to reboot to fix it:

Event Log Error: Windows failed to start the USB xHCI Compliant Host Controller for the following reason: Controller reset timed out. Check with your computer manufacturer for an updated firmware for the controller.

Device Manager Error on the first of the two “Intel (R) USB 3.10 eXtensible Host Controller - 1.20 (Microsoft)” Universal Serial Bus Controllers: Windows has stopped this device because it has reported problems (Code 43).

I have a USB-A drive which disconnects frequently on the right hand side USB-A port (like when I’m backing up the computer). The left port seems more solid, except the only way to get the drive to show up in the left USB-A port is to plug it in halfway, wait for Windows to recognize it, and then I can plug it in all of the way. This has nothing to do with the USB-A cards, as I can swap them around without changing how each side behaves.

I do a little experimenting every day. For instance, now I’m attempting to plug the USB-A drive in on the right hand side halfway (even though I don’t need to), and it seems more stable. Of course, given how flaky the patterns are, I could find any number of patterns involving what shirt I wear or dances I perform while plugging devices in.

I have already replaced the mainboard and the 1 TB card and sent the old ones back to Framework. I don’t know if they’ve been able to reproduce the problems on their end, but it would be interesting to know if they couldn’t reproduce the disconnect on a Windows 10 machine using my old Mainboard and 1 TB card. If they can’t, then maybe it’s purely a software problem (or something to do with the negative vibes in my apartment).

Some new anecdata:

For the most part, once I’m booted off of /dev/sda (aka the 1TB expansion module), it works without a hitch.

Occasionally, though—usually after I’ve done an update, but sometimes if I’m rebooting for other reasons—the Framework (11th Gen Core i7, latest BIOS) refuses to see the module, right from the get go. BIOS doesn’t see it as a device, let alone a bootable drive.

The most recent time that happened, simply popping the module out and re-seating made no difference. It was stubbornly being ignored. I wound up swapping it out of the front-left bay to the back-left bay, mostly on a lark.

That worked.

Of course, it could be coincidence. One way or another, there is DEFINITELY something weird and I’m pretty sure it’s a hardware problem, or else a deep firmware problem.

I’m surprised this hasn’t been brought up here yet, however this could be due to the card not responding correctly or fast enough to UAS commands on Linux (not sure about Windows). For Linux specifically, this may be solved with the following added to the boot command line in grub (GRUB_CMDLINE_LINUX or GRUB_CMDLINE_LINUX_DEFUALT):

usb-storage.quirks=13fe:6500:u

That will disable UAS, which may sacrifice a bit of speed but should be much more stable. I personally have random disconnect issues on various machines with various external NVME adapters and this fixes it for all of them.

Note that this is specifically for the card the OP used as the usb vendor and product ID was pulled from the dmesg output they provided. If this is needed for another adapter/card you may need to change the “13fe:6500” part with the information from dmesg that looks like this:

[<    0.017413>] usb 2-2: New USB device found, idVendor=13fe, idProduct=6500, bcdDevice= 1.10
3 Likes

As I understand it, this is a kernel-level issue. As described above, I occasionally “lose sight” of the drive at the BIOS level, suggesting that it is not an operating system issue at all!

Will this work if the USB drive is the boot drive?

It should work for boot drives as well.

I have some new actual evidence.

After my machine ran itself a bit hot (pipewire went haywire, still not sure why), I shut down, waited a few seconds, and then rebooted, resulting in the errors below and a drop into emergency mode. So, it found the drive, booted it, but then lost it again. Below is the photograph I managed to take of the screen at the time.

After shutting down again, and waiting a couple of minutes for it to cool down, I was able to boot normally.

We recommend using this for as a media or backup drive, not as a boot drive.

1 Like

PSA - this is to be used as a media drive or a backup drive (TimeShift or Deja Dup or Rsync from the CLI. Don’t use this as an install drive please. it’s USB and not likely to be a good time.

1 Like

Except for the disconnect issues, speed has actually not been a problem at all. It ought to be a perfectly fine boot drive, honestly…except that the USB is not reliable!

And…here’s the thing.

I need an external boot drive. There are reasons, and I don’t need to go into them. It’s a requirement for something I’m doing.

Now, I can just get an actual USB external drive–there are some high-quality portable SSDs in the world, now, that would probably serve adequately.

But if the USB is not going to be reliable, then that’s not a solution either.

I would also point out that the advertising copy for the module in the marketplace explicitly says that it is usable as a boot drive.

Screenshot_20230217_143812

3 Likes

This one of the reasons I prefer to use it for data storage.

I hear that, I actually have had use case scenarios where I was in the same boat. There are things we can try, but it’s purely as is in terms of running an OS off it.

This needs to be corrected as with Windows, it’s not officially supported: Windows 11 Won't Reboot but can Shutdown - #7 by TheTwistgibber

It’s doable for Linux, but it’s best seen as your mileage may vary.

All of that said, your errors appear to be bad sectors and we can attack this way:

  • Understand that if you don’t have any data backed up from this drive, now is the time because there is always risk of data loss. Live USB, secondary USB storage device, backup Home.

  • sudo umount /dev/sdb then sudo fsck -y /dev/sdb1 (or 2 or whichever is applicable) (To check for and -y to correct errors).

This process will be slow, very slow.

Faster approach:

  • Boot to live USB, gparted, create a brand new partition on the device.

  • Reinstall fresh (although I am not a fan of trusting USB to be flawless for OS usage).

There are no bad sectors here. The bad-sector errors were because the USB-disconnect was kicking in after it tried to boot. It later booted cleanly. I’m using ext4 as my filesystem, which has been nicely resilient in terms of journaled recovery when I have in-flight issues.

Also, as I’ve pointed out, the device sometimes is invisible to the BIOS, not just the operating system.

1 Like

This may be a bad card. Please reach out to support for help with this if it’s not showing in BIOS. Please indicate that it’s not showing up reliably.

1 Like

Except…USB needs to be reliable, right? I mean, on a device where all the ports are really USB-C under the hood, USB needs to be pretty much gold-plated perfection.

Look, I wanna be clear here. I’m not trying to be combative. I love Framework, and I love what it’s doing. I want to shout Framework’s name’s to the heavens and tell all my friends to buy at least one. Except that if the USB is flaky, then, I really can’t recommend it to anyone but hobbyists who don’t mind occasional flakyness.

1 Like

Sharing my experiences across Linux usb booting in general, not merely the laptop. Booting an OS from USB is doable, but not recommended on any computer in my personal experience (edit) outside of live booting a distro. I’ve done it across a spectrum of operating systems and computers. It’s flakey and while doable, is not recommended by me based on those experiences.

So hopefully that clears that up. That’s my recommendation. :slight_smile:

Moving forward, in your case, you will want to reach out to support for an expansion card replacement.

You definitely will want to have the card replaced.

I have to disagree here; you must be doing something wrong. I’ve run various distros including Arch, Debian/Ubuntu, and NixOS off of external USB drives without issue for years and without issue; and these are actual installations, not live images.

For instance, I normally run Arch and don’t like having 32-bit libraries installed, and I also don’t like how some games litter my filesystem with files and directories in inappropriate places, like directly in my home folder. So, I solved this problem by using an external drive for a “gaming” installation, which has Steam, Lutris, and other things setup just right, and is used only to play games from Steam, GOG, and some one-offs.

This not only solves my 32-bit and litter file problems (as I don’t care where it puts files in this installation…), but I can also move this installation from machine to machine by just plugging it in and booting off of it. If I feel like playing on my laptop, I can do that. Do I want just a little more performance? I can plug in my eGPU. Or do I want to go “balls-to-the-wall”? Well, then I can boot it up on a more powerful machine and use that instead.

I’ll run a machine for hours that way without issue, and haven’t had an issue with disconnects.

And this is also using generic NVME-to-USB found on Amazon of dubious quality, with SSDs from various manufacturers including Samsung, SK Hynix, but also sometimes Inland and other lower-tier manufacturers.

You should be able to boot and run a Linux distro stably off a USB stick; if you can’t there’s something wrong with either your computer or your usb stick (or possibly even your cable).

2 Likes

I also have not had any problem over the years running Linux off of an USB. What i have run into is needing to remember to exempt the drive from autosuspend power rules. My first suspicion in these situations is that the power management is for some reason suspending the drive i.e. tlp autotsuspends it, but another potential item here is possibly insufficient power. I am very interested in what the total power draw is on the laptop while stress testing in an OS on the expansion card drive. Since the 11th gen porcessors max out at 60w and the framework power supply is 60w I suspect there may be a situation where this causes the errors seen. I had very similar errors show up while testing underpowered docks on my 12th gen Framework. These errors did not immediately show, but only cropped up after multiple tests with a variety of peripherals connected. These errors would afterwards remain persistent until I either removed and reseated the expansion cards, or disconnected the battery in the BIOS and then held the power button down for 30 seconds after having plugged the power delivery back in. I do attribute some of this obviously to firmware that needs improvement, but I think the main culprit is power spikes of short enough duration where the switch from power supply to battery to power supply occurs with sufficient frequency to essentialy confuse/lose or miss a power event to where the firmware gets scrambled for lack of a better word and gets locked in the wrong persistent state.

1 Like