BOOT/RAM Issue -- My Memory Isn't What it Used to Be (and it wasn't that great to begin with!)

Greetings - I received my 12th gen i5 DIY edition Framework about a week ago in the mail; I’ve been generally pleased with it so far, but I am having one vexing issue. Wanted to field this here before reaching out to support, to see if there is anything I’m missing, and also to make it visible to anybody else having the same issue. I am seeing some similar things on here, but nothing that has produced a solution, as of yet.

EDIT 09-04: I’ve added updates below on the state of this situation, to keep this post up to date; in short, have gone through an RMA for the problem RAM to no avail. Also, edited original post body a bit for clarity & navigability (collapsed sections, slight re-order etc), as it was a bit… ahem, on the verbose side.


Here’s a summary of the situation:

the problem

So, I ordered my RAM and boot drive from separate vendors than Framework, and the RAM was going to come later that everything else; being the impatient millennial that I am, I grabbed a small/cheap kit of DDR4 at my local MicroCenter to run with until the actual kit came in, as I wasn’t sure how long it would actually take to be delivered. Stopgap kit worked like a charm (if a bit dinky); we booted and got Fedora 36 installed without issue, wonderful. All in all, pleased as punch. Fast forward 4 days: my actual kit comes in - sooner than I expected - woohoo! We swap it out… but no boot. Hmmm. Try a re-seat, no dice. Poked around on these forums and saw some similar issues people were having, tried various solutions (suggestions from forum, the guides, knowledge-base articles; more details below), but still can only boot with this initial, lower-end kit of RAM.

TLDR: Nice RAM doesn’t work as it should, but lower-end RAM does.


Want to note some potentially relevant details:

baseline details

The un-bootable kit in question was selected from this official list of tested RAM kits, 2x the Crucial CT16G4SFD832A; this is a 32GB pair of 16GB DDR4 sticks, running at 3200. The dinky, bootable memory kit is also from Crucial, 8GB worth of CT4G4SFS824A, 4GB sticks of DDR4 running at 2400. Other than being kinda pokey, seems to have no issues. Other things, just to fill out the picture: boot drive is a fresh WD Black SN850, with standard Fedora 36 workstation. I did opt to encrypt the disk during the graphical install (from live usb); I believe it probably uses LUKS but not actually positive. As of this writing, this page indicates the factory UEFI/BIOS is the most up to date (3.04, and yes, that is what is installed, of course.) I’ve only changed one setting in the actual BIOS, turning the power button LED brightness to lowest possible. Other than that, everything is strait-up stock I believe. EDIT: Forgot to put in the original post, I did also disabled the auto brightness adjust in favor of FN buttons, as that was suggested in the Fedora install guide.


What does it actually do?

bad behavior

The non-bootable 32GB kit (in any/all variations attempted), consistently does this upon power-on:

  • a few moments of RAM training (green side LEDs), maybe up to a minute
  • a pause period, perhaps 10 seconds or so. if charging: orange lights, if battery full: white lights, if non-plugged: no lights. (So, just the same as normal operation)
  • the diagnostic & post code sequences as described here
    – all diagnostic codes: green
    – orange divider blink
    – post code: bbgggbbg (that is, 11000110 or 0xC6, I believe)
  • stays poweerd on, but no lights (other than expected charging light), fan is running (if left in this state, keeps getting warmer, fan louder.) At no point in the whole sequence does anything appear on display

Any combo of the 32GB kit does this exact same thing, very consistently. The post codes (starting from when I realized that was something to pay attention to, maybe 2 or 3 attempts) is always the same. I saw somebody post this list of post codes… Haven’t dug deep enough to know if those are actually current (being from 11 months ago, pre-12th gen and presumably a few BIOS iterations), but might be worth considering. This is what I believe would be the corresponding code (unless I am misunderstanding how those codes are presented/formatted):
#define S3_BEFORE_RUNTIME_BOOT_SCRIPT 0xC6 // Restore system configuration stage 2
Now, this is interesting to me. Assuming this S3 is referring to ‘suspend’ state, I am wondering if there may be some conflicting, latent state floating around in the system, expecting the initial kit of RAM I installed, or something along those lines. I have been suspending via Fedora/GNOME power widget. However, I did do the full mainboard reset, removing the RTC battery etc with no change to the above behavior, so I don’t know how likely that is. Also worth noting, I have not seen this exact code noted by anybody else elsewhere on the forums. Mysterious…


The things I’ve tried :

troubleshooting

Roughly in order (but maybe not precisely), I have:

  • Re-seated the RAM (initial thing to try, and subsequently many times due to the following solutions)
  • Swapped the sticks to opposite channels
  • Tried to boot with only 1 stick (tried both, in all channel/stick configs. I also tried this with the dinky kit; that works.)
  • “boot” with no boot drive (dinky kit “boots” into BIOS-looking screen that effectively says “yo dog, you got no bootable media, give it another go.” The 32GB kit behaves as it does in all other scenarios.)
  • did the mainboard reset described in this official guide (followed this very carefully/faithfully, I believe)
  • variants of several above approaches, but jumping into either BIOS, boot loader prompt, or external boot media (usb stick) rather than just booting from disk. Always, the behavior with 32GB kit is consistent, regardless of any of those variables. I’ve also tried it a couple times not plugged into wall power, but great majority did plugged in. (The stopgap, 8GB kit, dependable as it is unimpressive, has consistently responded the way you would expect on paper to these various scenarios.)

EDIT new attempts 08-19:

  • as suggested below, tried resetting the embedded controller, behavior is exactly the same
  • disabled quick boot and quiet boot in UEFI menu; dinky 8GB reacts as you would expect (text on screen during boot process), but the 32GB kit still does not show anything (same post code, as well)

EDIT2

  • As suggested below, tried booting with mixed SKUs (1x4 with 1x16 stick) and that actually boots! 20G appear available, pass memory tests, but when going back to just the 1x16 sticks, reverts to previous behavior. Very odd… See comments below for details.
EDIT 09-04

EDIT 09-04: Framework support worked with me to determine possibilities, and ultimately guided me to go through with an RMA; I’ve now received a replacement kit (2x of SKU CT16G4SFD832A, same as before). Sadly, it seems that the behavior is exactly the same with these new sticks, which is unfortunately not super surprising.

I tried several of the solutions from the original run (collapsed in “troubleshooting”) with this new 32G kit, to be thorough, and all consistently gave the same output as before. Not much to report there, but perhaps worth noting that all post codes were the same, and when booting with 1x4G and 1x16G stick, new sticks passed memtest, just as before.

Additionally, I tried disabling SecureBoot in UEFI, because I realized I hadn’t tried that on the original go. (This did not yield results.)

I am reaching back out to Framework support to see what they suggest at this point.

EDIT 09-11: Ok. So following up from 09-04 udpate, here is the situation: FW support sent me a new mainboard, which was groovy of them. It arrived earlier today, and I swapped it out a little bit ago, which went fine. Cutting to the chase, the behavior is exactly the same with the new board. A couple things that might be worth jotting down here follow.
On the initial boot with the new board (first power-on), this was the configuration, just to be explicit:

  • New main board, recently swapped in (i5 1240p), BIOS/UEFI untouched
  • Replacement RAM 32GB ram kit (2x CT16G4SFD832A) from RMA
  • Same boot drive, SN850 w/ Fedora 36 installed
  • Everything else the same (chassis, all the stuff in it like wifi module, EC, etc.)

I tried it a few times without changing any of that stuff, just to, you know, be sure. After it became clear it was exactly the same (same post code, everything), I tried these things:

  • reset the EC, as I had done in tests with the original board (no change)
  • I have a fresh NVMe drive sitting around for different purposes, decided to try and boot w/ that & a live environment on USB (as if it was a completely new machine.) This was on my mind to try as I had been wondering if it might be some weirdness to do with either the bootloader on the SN850, or LUKS or something. Worth trying, but no dice.

Of course, system does boot with the fallback kit of ram (the humble 2400, 8gb kit), just as before; I have not tried booting with mixed sticks yet, but I’m not in a rush to, as I really expect that to behave identically as it did before, just like everything else has done.

See end section for some thoughts going forward from here.


What could it be?

original afterword

Well, obviously I don’t know. Any thoughts? One very real possibility is that I was shipped defected RAM. I have not discounted this; however, being that neither stick would boot individually, in addition to not booting together, and that the machine is hot off the presses with the first release of BIOS for this model, I would like to exhaust other possibilities before going through the rigamarole of trying to return the RAM - especially given that, if I get an identical replacement kit, the problem could very well remain. Unfortunately, I don’t have another machine that takes DDR4 SO-DIMMS, so I can’t really test them independent from the Framework.

Ok… that’s a little bit long, but I wanted to include all the preliminaries, so as not to retread ground. It is a bit of a frustration, but we’re already learning new things, and that’s never a bad. Plus, at this point, I can swap RAM in this thing blindfolded, faster than a marine assembles their weapon… Thanks for your attention, and thanks ahead of time for any help… Cheers!

EDIT 09-11: So, where to go from here? It’s a pretty perplexing situation. I am reaching back out to FW support and see what they would like to do. I feel like a fair bit has been eliminated at at this point, but since there are so many factors at play (brand new hardware that just launched, first wave firmware and so-on), it’s hard to really feel confident about anything at this point. My thoughts are circling these possibilities:

  • RAM (vendor or manufacturer side): It’s feasible the SKU itself (CT16G4SFD832A) has issues across the whole production. Notably, I have noticed that it is not listed any longer on Crucial’s site; I know it for sure used to because I looked it up there before purchasing it (from Newegg). It probably has been discontinued, but still seems odd. Being that I’ve now had 2 randomly selected kits of the SKU, both of which have passed several memtest runs (albeit not at the advertised speed & while paired with a mismatched stick), it only seems like something at a larger scale would be feasible. However, I have not seen anything really fitting this (say, like a recall), while googling about these issues many times over the past few weeks.
  • RAM (FW side): Consider this list, that I also posted above, of officially recommended RAM options. The problematic kit in question (again, CT16G4SFD832A) appears there, right toward the top. I selected the part based on that. That post is dated from July 7, a handful of weeks after manufacturing began on the 12th gen models (probably smack in the middle of it, for this batch anyway), and well before any consumer received an actual production unit. One thing that has occurred to me, probably those recommendations are really based on what worked well with 11th gen, and maybe some specification info Intel communicated to FW, pre-production. That list does not differentiate between 11th and 12th gen, it’s just general. Could be that changes between the generations were more substantial than anticipated, and this has yet to be corrected for in published info, firmware updates, or what have you.
  • Embedded Controller: It’s been reset a few times, and matched with different things (BIOS settings, hardware etc.) Since the main-board has been tested on two units, this is the now the remaining hardware from FW that seems it could possibly be at play, at least in my eyes. This is pretty out of my depth though, I know very little about it. Probably the farthest fetched one.

Of course it could be none of those things, and something completely different. Kind of frustrating that this thing is still going, but I’m pretty committed to figuring out what the deal is now. It seems like at least some others are experiencing the same issue (see @Salt 's comments below). I will keep updating here as things develop.

Yeesh, I really need to learn how to write… shorter. Sorry bout it… :sweat:

3 Likes

Oh, also, one bonus weird thing I noticed. Was looking around at the system to see anything unusual, and one thing caught my eye. Peep this output from inxi:

❯ inxi -Ixxx
Info:
  Processes: 412 Uptime: 2d 5m wakeups: 64282 Memory: 7.47 GiB
  used: 4.45 GiB (59.6%) Init: systemd v: 250 target: graphical (5)
  default: graphical Compilers: gcc: 12.1.1 Packages: N/A note: see --pkg
  Shell: Zsh v: 5.8.1 running-in: foot inxi: 3.3.19

particularly: Uptime: 2d 5m wakeups: 64282. that seems… like a lot of wakeups to me. For context, I have an older ThinkPad (T560), and it gives this for the same bits: Uptime: 4d 7h 21m wakeups: 156. Longer uptime, much fewer wakeups. Now, that one has mostly been sitting plugged into AC power on a desk, but not entirely. I have moved it around on battery a bit. Obviously this is not a scientific comparison (it’s running Manjaro Sway, not really similar to Fedora with GNOME in a lot of ways), but that still just seems like… a lot of wakeups, to me. It’s not something I know a lot about, so maybe that is not a weird number. Seemed worth a bonus post though, ha.

You could try a EC reset, by holding down the Power button / Fingerprint reader for 20 seconds.
Besides that i have no idea right now :frowning:

@Simon_F hmmmm, interesting. I feel like at one point or another, I must’ve held power button for that long (many of the no-boot session poweroffs, the button takes a while), but I can’t say for sure. I will give this a go tomorrow when I can also swap the RAM back beforehand (now, I must sleep, it is 2am where I am, ha.) Thank you for the idea !

Do you have another device you can test the RAM within? That way you can make sure you didn’t get RAM that is DOA

Unfortunately, this didn’t change anything. Interesting idea though, I hadn’t seen much around about the EC, looking through various support things.

Alas, I do not. This did actually occur to me. An RMA may be inevitable… just want to check through all options before going through that process. Thanks though!

Understandable! I wish you the very best of luck

1 Like

Apologies if I missed this in your post, but does it boot at least to where you can get into BIOS settings? If so maybe try booting into a live usb and running a memtest.

Also, completely unusual test but what happens when you do 1x16 and 1x4 stick?

@Smorez - yeah… kinda thinking I should’ve edited the post to be a bit more svelte, it’s a bit a sprawl lol

I can “boot”/launch into BIOS/UEFI (idk if this is technically a boot? but yes) when the any combo of the low end ram is installed (1x4, 2x4, in any slot combo), as well as properly boot the OS from NVMe. With the higher end kit, the behavior is always exactly the same (described above under “What does it actually do?” heading), with any configuration (1x16, 2x16, in any slot combo.)

You have tried the kits together but what about broken? In the primary RAM slot put one of the 4gb sticks and in the secondary put one of the 16gb sticks. If it boots see how much RAM is listed (it will force to the slower stick speed but this is to test if its being recognized.) If it comes up as 20gb of ram run a memory test in linux.

1 Like

@Smorez ok, now this is an idea. I wasn’t really positive if doing that was a good idea. I am going to try it…

It has always been recommended practice to use the same manufacturer, size, and speed but it won’t harm a computer if you use mismatched pairs. Mainly it’s because a lot of consumers don’t understand RAM speed and put a 1666 paired with a 3200 and are like wtf. Mainly as long as you understand that with speed it just forces the faster RAM stick to the slower speed then you’re golden. As long as it’s the same form factor it is able to be done, and in this case allows you to test it with a known good.

@Smorez

Right, yeah this totally makes sense. Was just being a bit cautious.

And also, it boots with mixmatched sticks! First attempt as well, 1x4 SODIMM in channel 0, 1x16 SODIMM in channel 1.

20G showing up (well, you know)

Fedora seems to have 2 things in the repos to use. Currently running memtester.

Been looping for a little while, I have not seen one not say “ok”

Here in a second gonna run memtest86+ the classic. Apparently it boots from grub? weird to think about, I don’t think I’ve actually run any memory tests not on windows… and I’ve been using linux in all personal settings for several years now. Hmmm…

I will post again in a while with memtest86+ results. I’m very happy to see this movement, its the only thing that has changed anything, so thanks much @Smorez ! ! Results so far almost make it more mysterious, to me, as to why the 3200 sticks won’t boot by themselves…

image

LOL

Guess I’ll be reading some man pages tonight…

Now I’m curious. Since the system has now recognized it and has it listed in the hard memory what will happen if you remove the 4gb and boot of the 16 that was paired?

I was also thinking about this. Will try it. Currently running memtester again, as I neglected to run it with sudo initially. Also, In my haste, I passed it against 1.7G rather than 17G, like a fool. Real smooth…

It is running much slower, perhaps not all too surprising.

— this post was delayed bc I hit the wee bab limit —

The answer to this question, is that even after booting while paired alongside a 1x4 dimm, neither 1x16 stick will boot by itself, afterward. The post code is the same…

Have booted successfully with both pairs of mixmatched 20GB options; lets say, 4A with 16A, and 4B with 16B, ( also flopping which channel has the larger/smaller dim, between those combos), and run memtester with seemingly non-problematic results. I haven’t tried the exhaustive 8 possible combos of all those things (2sticks x 2sizes x 2channels), but not really expecting any of those would yield new results, after the first two.

At this point I think it’s RMA time sadly.

1 Like

Yeah… gonna reach out to Framework and see what they say. Thanks for your help @Smorez !!

1 Like

Edited orig. post to add an update to situation (spoiler: still unresolved), as well as pare it down to be more readable.