FW16 freezing under high I/O, lose connected devices, graphics issues

Hi there, Everybody!

I just got my DIY FW16 a couple of days ago. I’m super excited to make this my daily driver laptop! I especially like how all the HIDs are tool-less to install and remove. I’m starting to warm up to the expansion card concept, too!

I’ve been having trouble getting this thing stood up and ready for use. While I did make a support ticket already, I thought it would be a good idea to ping the framwework community about it. Maybe there’s some glaring thing I missed, or if maybe there’s a quirk I just don’t know about.

Four days ago, I put it together, and did a RAM test. When those passed, I put Bazzite on to test out the laptop, and that’s when I noticed there were problems.

When I performed tasks that used the NVMe drive heavily, such as a download, or updates, It’d crash out. Sometimes the program would, sometimes the whole OS would. I tried changing out the second-hand NVMe drive I installed with another one, but now it freezes up, or fails in spectacular ways during installation instead. Sometimes just a lock up, sometimes the install errors out, and sometimes graphical artefacts show up on screen.

After a bit more troubleshooting, I tried doing regular tasks on live media on USB just to be sure I didn’t have two bad NVMe drives. I noticed that I’d get crashes then, too. Only this time, I was able to catch an error before it locked up that it lost access to the usb disk.

I’ve tried bumping up the BIOS to v3.03, but that didn’t really help.

I’m really torn. I keep leaning towards this being a motherboard fault, but my sensibilities say that shouldn’t be the case, as it was surely tested before shipping. I saw suggestions involving popping the cooler assembly off the motherboard to check for thermal paste issues. But, that was for the 13, and with how wrenched down this board’s cooler is, I’ve got cold feet trying to pop it off unless I have real good cause, and/or Framework’s blessing.

Has anybody had symptoms like this before? If you have, what did you find out? I’d love to hear the community’s two cents on this.

Thanks a bunch!

1 Like

Have you checked to see if your SSD firmware is up to date?

Good catch! I forgot to mention that.

In both drives I tested. There wasn’t a new firmware available. The first was a second hand Kingston from a dead computer (I forget the exact model), the second is a new TeamGroup MP44L. Admittedly, I don’t have the models of WD SSD that Framework sells.

What’s more, I booted a live USB, and downloaded a file to another USB stick, skipping the NVMe altogether, and I still had freezing and crashing issues.

I haven’t had crashes with my NVME drive, but I am having all sorts of problems with USB disconnects.

I just booted into Fedora 40’s live USB environments (both the standard GNOME workstation distro as well as the KDE spin) and I also noticed that the USB drives disconnected and caused the system to become as unstable as one would expect when the root filesystem becomes inaccessible.

It seems to be some combination of repeated power supply disconnects and reconnects, and repeated system suspends/resumes.

This really seems like a hardware and/or firmware issue on the mainboard. Especially since I see similar unreliability even just in the BIOS.

So far I have no issues with crashes or lockups with heavy disk writes either USB or NVME.

I think you may have a defective MB. Hopefully support can further troubleshoot.

With both the Live USB environment and an OS installed on 2 different NVME drives bombing out, I really don’t think it’s any of the disks. If you can, I would run disk tests on those drives on a different laptop or PC (known working one) and see if they come up with any errors. It’s a long shot, and I even doubt you’d find any issues as I type this just from your description of symptoms.

I suppose a status update is in order, and oh boy when it all clicked I felt like a doofus.

Long story short, Framework support identified that it was bad RAM.

But I already tested the RAM after putting the laptop together, and it passed. So I tested it again…

…It failed testing.

Still didn’t believe it. So I gutted my laptop from my employer, and swapped the RAM. The work laptop wouldn’t even POST with it, the FW16 ran just fine.

So then it clicked, it all makes sense now.

The RAM failed less than a day after I tested it, and installed an OS. Like, 8-10 hours after. Re-checking the RAM never crossed my mind because of how improbable that is.

Nobody’s fault, mind you. Just bad luck.

So, all of the symptoms above in the original post? Bad RAM.

Graphics corruption? That’s because I’m using the integrated graphics, which uses 1-4GB of the 32GB stick. I had it set to 4GB.

OS crashing the next morning? Can’t load files from disk? Random processes crashing? The previously loaded data had a big ol’ hole in it.

Failed installation? Since the installer was Linux, it’s live media. So it would fail once more of the RAM gets utilised to handle installation.

BIOS and and BIOS update running fine? Never touched the bad area.

Live media losing it’s source disk? Miracle it loaded at all, frankly.

The RAM’s getting RMA’d now, but I’ve just bought a stick anyway. What a perfect excuse to double the RAM.

1 Like

I would have thought the ECC capabilites of the ram should have caught this. I guess the ECC that is there is for practical purposes useless.

It’s not real ecc, there is no reporting to the cpu, and the os.
That’s why real ecc (only available with pro apu or some xeon) are really useful.

Not going to lie, I feel big-kid ECC would’ve made this whole situation a lot less painful.

But, them’s the breaks, I suppose. Consumer grade hardware doesn’t get these features, and there’s nothing I can do about it. I know there’s corporate reasons surrounding stuff like this, but…

~le shrug~