[RESPONDED] 1TB expansion card disconnects randomly

I think I’m having the same problem. My computer is about two months old. In the expansion slots I have USB-C (back) and USB-A (front) on the left, USB-C (back) and a 1TB drive (front) on the right. I usually plug it in from the left USB-C with a 96W Apple charger and have nothing in the other slots. Though I’m pretty sure it’s crashed with and without being plugged in.

Recently installed Parrot Linux on the 1TB drive with BTRFS. After 1 - 6 hours of operation, the OS will crash and the screen will fill with journald errors complaining that it can’t write to the disk. This happens both when I’m in the middle of doing stuff and when it’s idle (I don’t have any automated power management/sleep/hibernate stuff in place).

btrfsck and hardware checks say everything is good.

Same here. Problem occurs on any card bay, btrfs and ext4 filesystems.
Bios: 3.07
OS: Fedora 35 on card, Win11 on 1Tb NVME
250Gb expansion card, 1 type c, 1 HDMI and 1 type A.

Recently installed Fedora 36, 5.17.2 kernel. No problems for ~20h uptime (non continuous).

I suppose I can’t keep editing my old post with disconnect updates. I have one more to report, the first in 1.5 months. This time I wasn’t doing anything with the drive. I was just working on my computer and using the good old C:\ drive, when I saw the popup that the D:\ drive had disconnected. The event log simply says:

Log Name: System
Source: disk
Date: 5/19/2022 1:51:26 PM
Event ID: 157
Task Category: None
Level: Warning
Keywords: Classic
User: N/A
Computer: WorkBrian
Description:
Disk 1 has been surprise removed.

1 Like

@nrp
Is there any update from Framework on this issue? It was first raised in Dec ‘21 and you jumped into the chat around Feb ‘22.
I recently cancelled my order on the basis of this problem. It would be good to know it’s in hand.

3 Likes

Has anyone tried the expansion card with another thunderbolt equipped machine? I don’t have another to test myself. I wonder if the disconnects happen on another machine, if so, the issue is probably with the card itself.

I have a MacBook that I could test on, but since the problem only happens sporadically, I’m not sure what the best way to test would be in order to maximize the chance of reproducing the disconnect.

My own theory of the problem is that it’s related to a different issue: [RESPONDED] Quirky USB Port

1 Like

The storage card is USB4, not Thunderbolt. I do agree that we need to look for someone with a USB4 port that can see if it’s a framework-specific issue. I’ll ask around, thanks for the idea!

2 Likes

Hi everyone, just following up on this. We provided our Storage Expansion Card supplier with a setup to try reproducing this, and they weren’t able to on their end. Internally, we have been able to trigger the behavior rarely with one setup, but we also haven’t been able to create a reliably reproducing scenario for it. We’re collating the responses in this thread to create a more reproducible setup so that both we and our supplier can replicate and then root cause the issue.

7 Likes

Following on with that, if anyone has a consistent reproduction case for this that they can share, that would also be much appreciated.

3 Likes

Following on with that, if anyone has a consistent reproduction case for this that they can share, that would also be much appreciated.

What’s odd is that I can’t reliably get this to mis-behave, it happens randomly and usually at the worse time (Like in the middle of a Counter Strike Game!). The only thing I can think of is getting another 1TB expansion card and see if it ever happens with a “new” one.

Is there a tester programme I can sign up for, for this? Rather intrigued by this issue…but I don’t have a storage expansion card to test with. (The TBW endurance is low for the price point…and the price seems to be set largely because of the form factor uniqueness)

Has anyone tried the following test case on Windows and Linux (?):

  1. Start a large file transfer, say, 500GB. Copy to the expansion card (Single file), AND
  2. Start another large file transfer, say, 500GB. Copy from the expansion card (Single file)
  3. Then put the laptop to sleep…while both files are still copying.
  4. Wake the laptop up once the laptop entered into sleep.
  5. Should still be copying at the point.
  6. Now hibernate the laptop.
  7. Then wake up from hibernate.
  8. Should still be copying.
  9. Repeat steps 3 to 8 until end of copy.

Also, potentially inject a step 6.5:
6.5: Physically remove the expansion card, and re-insert the card into the same slot.

Ideally, 6.5 shouldn’t trigger anything strange as the system is supposed to be in a hibernated off state…but you never know.

Also, there’s the question of power source changes (say, flip-flopping between internal battery and USB-PD) during file copy. As well as hotplugging other devices into / out of other slots.

My instance of this would occur regularly from 3-5 days of using the OS, so just about long enough to get all of the settings, customizations, and installations through. In fact, I have a setup external SSD that had just started to exhibit the issue, With some guidance, I could grab any logs needed.

One item that might “help” this along is I would install the OS, and the applicable software, and then install about 30-50 GB of steam and GOG games onto the external SSD. On the 256 GB version, this would leave somewhere about 180 GB “free” (Swap to match RAM at 16GB< ~ 10% Over provisioning).

The use case over those 3-5 days would be general use, some social media, some youtube, and some gaming, there might be 2-3 shutdown/reboot cycles in that time, most of them during the first day to get stuff installed/updated/configured. This would be roughly 12-14 hours of use each of the 3-5 days. The error usually cropped up during the pause/load/buffer of a new youtube video in a queue. After the first occurrence, it would occur again within the next few hours, without warning or cause, opening up the issue to occurring on resumption, or entering sleep.

Items that did not seem to effect the occurrence: Running a checkdisk or FSCK scan on boot/reboot. Swapping between the OS on the internal and external SSD, the port the external SSD was in. Utilizing a thermal pad installed in the 256GB model like described for earlier 1TB models.

1 Like

This reproduces semi-regularly for me, always with the same use case: hosting a WSL2 distro on the disk, so a large VHDX on exFAT. Since I don’t have a consistent repro for this, are there any useful logs/ETW traces that I can get to help troubleshoot this? I did confirm via opening it up that my expansion card doesn’t have the thermal pad, so I’m considering just trying that as well.

1 Like

This issue is still preventing me from properly using the drive. I’m going to do some more sustained r/w and heat testing to see if it will cause issues.

Update 1: formatted and installed Ubuntu 22.04 on it as a benchmark. I did about an hour of configuration (mostly messing around trying to heat it up) before I suspended it and on wakeup was met with drive write errors and a full system crash. Took it apart and it didn’t seem too hot at all. My theory is the sustained read on system resume caused this one.

I’ve located a friend with a Thunderbolt 3 port on their laptop (14” Lenovo Yoga with an 11th gen i7), and they’re willing to do some writing in documents on the drive, so I’m handing it over to them for the next test.

1 Like

This happens completely random for me. I use the drive as a storage on Windows 11 - formatted as NTFS. It can happen while I am browsing the web, watching a movie, or when it’s left idle. In either case Windows Explorer opens automatically to show the drive.

I wonder if this is a voltage / powered hub/dock needed thing or not. Similar to the SDR thread.

It seems that the 3.09 BIOS has corrected this issue for me. Haven’t seen it creep up since installing whereas I would see it 2-3 times a day before. Moved my game installs and using WSL on it as well, no issue since the BIOS Update. It may be early to say but Good Job Framework!!

3 Likes

Thank you for reporting this, I’ll keep an eye out on mine this week.

2 Likes

Will install that later today. If this works I’ll be very happy!

2 Likes

With the reports of 3.09 “fixing” this, I tried with the install that was doing this regularly. Logged in, updated the OS, and started updating some games for the screen to go blank and start throwing the disconnect, and read only errors.

I’ve re-installed the OS, and have started to perform the use case described in my previous reply to see if it was a situation of once borked, always borked. Hoping that the BIOS update ‘did’ fix this, but not holding my breath considering the inability to reproduce for the manufacturing partners.