Debian 12 Apt updates Bricked AMD 7040?

@Michael_Liesenfelt
Its a little unclear. Did the 16GB DDR5 memory module fix the problem, as in, its not bricked any more?
If you can get into the BIOS, what bios version is it?

I’m sorry I wasn’t clear. (editing original).

No, the problem isn’t fixed. The problem was never memory size and I was just indulging level-1 tech support by trying 16G.
The system is still bricked.
The screen never illuminates.
I’m not able to access BIOS.
My hypothesis is that it’s the GPU firmware somehow.

Current Hypothesis: Suspected package is firmware-amd-graphics specifically with something Framework has in their stock as-shipped BIOS/UEFI (May). Debian 12’s sources.list > non-free-firmware repository is UNSAFE for Framework 13’s.

It’s a big red flag that the moment I plug in the AC power the LED starts blinking RED and never stops, ram or no ram, big ram or small ram, on or off.

So I’m at the point of needing an RMA, a flashing kit to force a firmware downgrade, or needing to buy a new mainboard and block updates.

Did you try removing the battery? The EC might have got itself confused and need a reset.
I.e. remove battery, remove power cable, leave for 60 seconds, then plug power in and try to switch on.

I unplugged the machine and left it all week until I had some free time today.

Framework Tech Support Response: Perform a memory shuffle, again.

I plugged in the battery, there was 2 red led flashes. Then dark.
I plugged in the power cable and the red led flashes begin.
Power-on button, wait about a minute, 12xG & 8xG flashes, wait 5 minutes, screen never illuminates, processor gets hot.
Moved 1x16G memory from channel 0 to channel 1.
Power-on button, wait about a minute, 12xG & BGBGBGGG (0x15 // PCI enumeration complete), wait 5 minutes, screen never illuminates, processor gets hot.
Moved 1x16G memory from channel 1 back to channel 0.
Power-on button, wait about a minute, 12xG & BGBGBGGG (0x15 // PCI enumeration complete), wait 5 minutes, screen never illuminates, processor gets hot.

The red led never stops blinking. WTH does that mean.

Framework, you want to give this motherboard to your firmware engineers to dump, learn from, correct, and QA. Framework, I think we are at the point where you want to send me a new replacement motherboard with a return shipping label.

Just as a reminder: You are on a community support forum. You are talking to community members. The community does not take orders from you.

2 Likes

The “firmware” in this package is a runtime image which is loaded onto the card during boot after the kernel starts, and it is not written to flash.

The red blinking lights are just the chassis intrusion indicator. They typically blink when the top cover is off.

If the top cover is on and they are still blinking, the chassis switch is not fully depressed. There should be a foam or rubber block on the underside of the input cover that presses a small switch on the mainboard.

1 Like

In the spirit of full transparency I’m sending this same stream of updates directly to framework over email and to this community thread. Thanks for explaining the red blinking relative to the input cover!

I assumed the debian stable repository updates would not disrupt any firmware, but I was mistaken in a way that you, I, and framework doesn’t fully understand yet. I highly highly highly doubt using 96GB of ram for too many reboot cycles broke the motherboard permanently. We are at the point where no amount of ‘buy new ram’ ‘reset the board’ ‘unplug the battery’ ‘shuffle the ram’ will fix this. Framework can choose to get this board and learn from it, or they can choose not to RMA and make me purchase a new board. I may be forced to buy a new board soon so I can start getting work done.

1 Like

Just a thought. 5 mins wait might not be long enough. Try waiting 15 mins.

Another thing to try is unplug in eDP cable to the screen and plug in an external display to one of the side slots. The BIOS might then display on the external display.
The BIOS will only display if the RAM is working.
Maybe your screen is faulty, and this would help diagnose that.

2 Likes

That’s a great idea!

I unplugged both display cables. I plugged in an external display over a DP cable. I turned it on and waited. I got 12xG and 8xG led flashes. The externally connected display never got any signal. It’s been more than >30 minutes and nothing. Fan spins, processor gets hot to the touch, and nothing.

On July 5th Framework Support Level1 said they were escalating my issue. I don’t think it ever made it out of Level1 support. Yesterday, July 20th, I got a very similar email from Framework Support Level1 stating that they were escalating my issue. :man_facepalming: It’s fairly clear they aren’t reading the thread and I’ve begun the script again at step1 memory shuffle. They don’t sell firmware flashing kits or have docs for factory flashing/unbricking the board so I’m stuck.

I am now forced to spend another $650+shipping and see if their sales/returns department is better than their support department. I know the board is good because I got my Debian environment setup with all hardware working, restored 1.5TB of data, and used an external DP screen on 96G of ram and 48G of zram. Eventually on their time a returned board will make it to a firmware engineer to reinitialize and go back on the website for sale. On my time, I have work to do and don’t have another month to play with Level1 tech support anymore.

This comes across as incredibly entitled and demanding. Don’t repeat such demands if you want further community assistance. We will lock this thread if you continue behaving like this.

2 Likes

Fair.
I am not entitled, so I already purchased another board this morning.

Hi. Those 12xG and 8xG led flashes make no sense at all now.
If you unplugged the internal eDP screen display cable, you really should not get 12xG, it should be 11xG, 1xR. The last one being the “internal display”, which being unplugged now, should not be green, it should be red.
Those led flashes are not matching up with the words you are describing.
It could simply be some sort of bug in the BIOS, because I have not unplugged my eDP cable to see what LED flashes I get, but the assumption I am making if the 12th flash is for " Internal display initialized OK" then unplugging the internal display cable should reasonably result in a red flash.

2 Likes

With the external DP connected and the internal disconnected I get:
12xG + BGBGBGGG (0x15 // PCI enumeration complete)

With the external DP disconnected and the internal disconnected I get a new flash:
12xG + GGGGGBBB ( 0xE0 ? )

:person_shrugging:

I am a little confused. the eDP is the internal cable between the mainboard and the internal laptop display. So, how can you have the eDP connected and the internal disconnected?

1 Like

(revised prior)
In the video I have both external and internal unplugged and I still got a pattern:
12xG + GGGGGBBB ( 0xE0 ? )

There should have been at least one red led flash.

I also realized something profound about the Framework mission statement:

Our philosophy is that by making well-considered design tradeoffs and trusting customers and repair shops with the access and information they need, we can make fantastic devices that are still easy to repair.

My first experience with framework has not been easy and certainly not easy for me to repair. The problem is definitely not the ram, not 16G, not 96G. Framework doesn’t have a document or information for flashing/restoring/validating firmware. This has all occurred within the 90 day warranty. I’m not an idiot and I want to help the root-cause analysis because I believe in the Framework mission.

Be aware that eDP doesn’t work exactly like a normal display port, and, while I have no practical experience to draw on in this case, I think it is likely that the laptop’s motherboard may not be as instantly aware of eDP disconnection as it would be when a normal display port is disconnected. eDP disconnection is not something that is allowed at any random moment. But I would expect it to be checked at power-on.

I have located Michael_Liesenfelt’s ticket and will take it forward from there.

5 Likes

Thank you Matt.

[SOLVED] TLDR:
I’m typing this on my fully functional Framework 13.
I was wrong. The firmware update didn’t cause the failure.
Framework was wrong. The memory didn’t cause the failure.
Both the MediaTek Wifi and Motherboard PCIe failed coincidentally, likely infant mortality.


Process

The new motherboard arrived and I installed it with 16G of ram. Shockingly, the system failed to boot with a BIOS code of : BGBGGGGG. Yet another strangely unrelated code. I assumed the motherboard was good and realized the only piece of hardware in common was now the wireless network card. I removed and set aside the stock Mediatek wifi and the system booted successfully and quickly on 1x16G Crucial DDR!

This was a very positive sign, so now I wanted to verify that the Crucial 2x48G of memory worked successfully. Of course, it successfully trained in only a couple minutes (note the clock):

The Framework firmware memory test routine isn’t properly programmed to check any amount of memory up to 256G. Minor bugfix issue.

The Ubuntu LiveUSB booted successfully and all the memory works great. I used a known-good Atheros-based wifi module for the remainder of the tests.

Next I added my 2TB NVMe and setup efibootmgr to boot my previous Debian 12 system. At this point I was feeling confident the root of the problem had been discovered and Debian was not at fault. The system booted Debian just fine:


Linux version 6.1.0-23-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15)

Finally, I wondered if the original motherboard was failed, or if the failure was only because of the Mediatek wireless NIC. So I replaced the new motherboard with the old motherboard and only 1x16G of ram without including the any PCIe network card. The system failed to boot, same BIOS flashing as before.

At this point I am confident by process of elimination and substitution that both the original motherboard board and original network card had a simultaneous hardware failure well within the first 90 days. I’m not sure if the failure is correlated or not. I’ll pay to send @Matt_Hartley and the Framework team the original motherboard and original Mediatek for their further analysis. I don’t have a pci express protocol analyzer.

Follow-up Tasks:

  1. Update the Framework 13 AMD 7040 memory compatibility page:
  • Crucial CT16G56C46S5.M8G1 16GB DDR5-5600 SO-DIMM OK - Limited validation
  • Crucial CT48G56C46S5.M16B1 2x48GB DDR5-5600 SO-DIMM OK - Limited validation
  1. Update Framework’s Level1 tech support instructions to include unplugging BOTH the NVMe and Wifi PCIe, just in case.

  2. Upgrade firmware POST memory check routine to support up to 256G of memory, not just up to 68,717,379,584. ( 2^36 = 68,719,476,736 ?? )

  3. Framework needs to figure out this ‘not officially supported by the Ryzen 7040 Series platform’ relative to the actually official AMD 7840U product page citing support for up to 256GB of memory. I highly recommend supporting up to the processor memory controller limit of 2x128GB of memory, because AI/LLM’s are going to use it and you know the Framework community loves upgrading!

Thanks to all of you in this thread, @James3, @Fraoch, @DHowett, @coucouf, @Adr, @Mario_Limonciello, @Matt_Hartley , and Framework Support. This wasn’t exactly the smoothest process, but I’m still going to be buying Framework. It was so easy to change components relative to every laptop I’ve ever repaired going back to 386 IBM Thinkpad’s. I will handle things offline from here with Matt.

4 Likes

Its great that you found a solution that now works.
A failed wifi card taking out the pcie controller on the mainboard or visa-versa is rare.
You might also have been lucky that the wifi card did not break the new mainboard.
I guess we know what the 0xE0 code means now. Faulty PCIe controller.

Ah, it’s great that you figured that out! The motherboard and wifi card may have had some kind of voltage or current surge they experienced together, due to one of them failing, but that’s just speculation.

That’s just under 64 GiB of memory:

>>> 64 * 1024 * 1024 * 1024
68719476736

I think that 128 GiB DIMMs don’t exist, and neither do 64 GiB DIMMs in this laptop size, 48 GiB is the biggest, and they’re new for this generation of memory. So it seems the EFI firmware memory check was written only expecting up to 32 GiB + 32 GiB = 64 GiB memory. Framework didn’t really write this part of the firmware, this is the domain of the Insyde UEFI “bios” company/contractors (and maybe the AMD AGESA package they use for low-level init …)