[RESPONDED] 1TB expansion card disconnects randomly

So, I was thinking about this further (totally brainstorming here, and please comment if I missed something), IF the 1TB or 250TB is a true USB 3.2 device (old name), it’ll default to 20Gbps ~= 2.5GB/s ~= PCIe 4.0x2, correct? So I’m thinking…

The Framework Laptop 11th Gen has a total of “1x PCIe 1.0 & 12x PCe 4.0”…
1xPCIe 1.0 = WLAN & Bluetooth
4xPCIe 4.0 = NVMe “Harddrive”
4xPCIe 4.0 = Either eGPU and/or Powesupply (most 85-100w USB4 chargers will utilize the full bandwidth for power delivery, b/c they’re lazy in design most of the time…)
4xPCIe 4.0 = the other USBC 4 port (which I have connected to a TB 3.0 docking station)
…so I’ve already used all my bandwidth without even accessing my 1TB expansion card or the internal iGPU which will also require some bandwidth (but mine’s disabled in this instance)…I mean, Framework has some great engineers to even get it to work 75% of the time to be honest! I’m guessing here…but with the 12th gen, it’s documented by Intel to have 20 PCIe 4.0 lanes, so I’m thinking this will be “fixed”…

Just throwing out some thoughts/different perspective :slight_smile:

References:

Its been a long while but I am back. I am on bios 3.10 on nixos with uname

Linux fwlNix 5.19.5 #1-NixOS SMP PREEMPT_DYNAMIC Mon Aug 29 09:18:05 UTC 2022 x86_64 GNU/Linux

Its pretty consistent for me in that it always happens on the framework laptop in any port. mounting RO makes it much less frequent however, to the point I think I am experiencing a different issue altogether (maybe powersaving/wake from sleep). RW I don’t think I have made it past 3 days with infrequent writes (basically just to bash_history), and a few hours if I tried to run a rw OS on it like Gentoo (which typically failed before I can login).

I still have not had any issue on my desktop, but it is only USB 3.2 type c, not Usb4/TB mobo. It has run in there perfectly happy with same programs running for months. Which is why I am happily keeping both of my 1TB expansion cards even though they are… flaky on the laptop itself.

I have not turned off usb powersaving since I switched to nixos with bios 3.10. I’ll give that a shot latter. But glad I am mo longer the only one frustrated/baffled by the issue. Sorry for the team, but… you know how confusing this is to look at and try to diagnose? I was on the verge of going insane trying to find the STR even on my machine which I know has the issue. Trying to search for workaround was equally frustrating as well… without STR I might have to wait days before the issue reoccurs in normal usage conditions. I can’t even be sure if the issue will occure within a few hours if I write to it separately or nonstop. And I can’t diagnose this all the time, I had to actually use the laptop eventually.

Also comforting to see windows users have issues as well… an os/fs agnostic problem is makes the issue much more likely to be out of my hands.

If I had been able to find a constant reliable STR I would happily write instructions, a script, whatever and ship my framework back to be looked at. This issue is cursed. Just be happy if it reproduces withing a few hours when you want to know if its still there and if it takes a few days to occur when your not checking if it still exists

1 Like

I’ve had a 256 GB expansion card connected to my 12th gen Framework laptop (i5-1240p) for about 4 days now and I’ve not seen a single random disconnection so far. Though I’m not using it too heavily (not booting an OS off of it, though I did use it to install Arch Linux on my laptop), just as an extra device to store some files. I don’t see any weird kernel log messages related to the storage expansion card when I run dmesg. My kernel version is 5.19.7-arch1-1

I had another disconnect today (the first since May 19th). It’s not happening that often, despite an automated script using the drive every morning to process ~1 GB of data. But it’s not not happening either.

I guess I don’t need to report any more disconnects. I probably didn’t even need to report this one!

Log Name: System
Source: disk
Date: 9/14/2022 10:25:19 AM
Event ID: 153
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: WorkBrian
Description:
The IO operation at logical block address 0x2131ab78 for Disk 1 (PDO name: \Device\00000054) was retried.

Log Name: System
Source: disk
Date: 9/14/2022 10:25:19 AM
Event ID: 157
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: WorkBrian
Description:
Disk 1 has been surprise removed.

Edit: I know I promised that I wasn’t going to write again, but I had a surprise removal in the middle of the night last night with the standard “Disk 1 has been surprise removed” event. However, there was also another event log I haven’t seen before. Maybe it’s a clue? Or maybe it’s just what disconnecting during Windows modern sleep looks like…

Log Name: System
Source: Microsoft-Windows-USB-USBHUB3
Date: 9/20/2022 2:29:47 AM
Event ID: 196
Task Category: Surprise Removal
Level: Warning
Keywords: (1)
User: SYSTEM
Computer: WorkBrian
Description:
USB device draining system power when system is idle.
USB Device: VID: 0x13FE PID: 0x6500 REV: 0x110
Removal action failed: SkippedAsRecentIoObservered

I don’t think I’ve mentioned one other important thing. The drive always reconnects automatically and almost immediately. This disconnect didn’t affect my daily script in the morning. The only reason I noticed it was that windows had left a notification up.

I’m curious to see if this issue persists with a different mainboard. I’ve been experiencing these random USB disconnects for some time now and have known that it’s a hardware issue. Might be compelling to replace my already broken board.

1 Like

The problem has increased in frequency for me - 14 disconnects this month. I don’t think I mentioned this before, but the drive instantly reconnects. It’s always there when I look, even if the error dialog has just popped up.

I also wonder if anybody has had success in fixing the problem by changing the mainboard. There’s at least one report of that fixing USB problems: Quirky USB Port - #13 by Michael_Wu

Windows 11 ~ I think I may have this problem as I often get a notification that it needs to connect etc.

I use it as a virtual encrypted drive and have to often re-mount the drive.

However as I use hibernate nearly all the time I thought it may be down to that ???

@Danny_Goff, your theory about the Framework running out of lanes comes off as sound to me.

To my knowledge, PCIe lanes are not dynamically allocated at runtime…it’s not like you can unallocate a lane from a port. So, that doesn’t explain the ‘disconnect’ / dropout.

Got referred here from a thread I made- I seem to be having the same issues.

OS: Arch Linux
Card: 250gb Expansion Card

Admittedly, I don’t have any ‘USB Disconnect’ messages in any of my old dmesg logs, but the behavior everyone is describing is more/less identical to what I see. At random, my FS drops into read-only mode for seemingly no reason. It can be minutes after boot, it could never happen in a boot cycle, it could be a few hours in.

It does seem to be associated vaguely with file writes. I say vaguely because I swear it’s been ‘triggered’ by ctrl-clicking a link in chrome to open a new tab, and I’m not sure how write-heavy that is. It’s also died on a package install once, but that one was a ~7meg install, and not long after a solid 2-3gb install just prior, within the same boot cycle.

Would love to say my issue was resolved completely but it’s reared its ugly head again. Issue seems more apparent since I’m trying to use WSL on the 1tb expansion card and my distros keep getting corrupted due to the disconnects.

Windows 11 22h2
1tb Expansion card

Would love ANY insight into why this is happening in the first place.
Thanks!

Some additional quirks I’ve noticed recently:

When booting from the card, what most frequently triggers the disconnect is journal writes on suspend and startup. I’ll get a “journal aborted” message on the screen among other related errors.

I’ve also noticed that when I get such an issue, the longer I wait to restart after a force power off, the longer the interval before another disconnect. This leads me to believe it may be heat related, and I will be following the 1TB guide to install a thermal pad to see if it improves performance.

I would like to toss in that the new 3.17 BIOS for the 11th gen does not correct the issue.

Unfortunate. My own testing is leading me to believe it’s not a software issue, but I’ll post updates in here as I find out more.

Yeah, which is a shame… As this thread shows a deal of the purchases of the external drives were for things like dual boot. it was certainly my purpose with it. I was hopeful when the team reported that they were trying to reproduce that they’d find something; getting less so as time goes on. Guess it’s time to just treat it as a fast/large USB thumbdrive.

Adding my two cents – I came here because I just got a 1TB expansion card, intending to use it for dual-boot to separate some work from personal stuff, and sure enough, I’m seeing the problem others are reporting.

11 Gen, Core i7, Manjaro; other cards including two USB-C and one USB-A.

It’s happening pretty often, which makes the card basically useless for its intended purpose (and really, if it’s going to disconnect randomly, I’m not sure it’s useful for ANY purpose).

2 Likes

Just to note: This is not just a 1TB issue. Maybe once every week or two I get a disconnect.

@Michael_Scot_Shappe - sorry that you’re hitting this issue. You might try adding a thermal pad as described here: 1TB Expansion Card Throttling - Framework Guides

On the occasion that I use my expansion cards (Windows is running on the one that I use the most), I haven’t had them disconnect, but it has been a long time and I don’t know that I pushed the machine hard when booted from them.

Hopefully if you try it out it greatly reduces the frequency of the disconnects or stops them altogether.

Good luck with it!

There does not appear to be a heat issue with the card – it’s not particularly warm to the touch. Also, given how long this issue has been going on, if there really was a heat issue with the card, I would honestly expect Framework would have changed the card spec to include the thermal pad, which I gather they haven’t. Keep in mind, this is a brand new expansion module.

In fact, reading the document you link, it appears that newer cards already have the thermal pad applied (or are supposed to).