Coreboot/OpenSIL Porting Underway for AMD FW16

Oh wow awesome work! Coreboot working would be amazing!

Do you know if the pinout is the same on the JECDB connector for Lilac? My Glasgow got lost in the post when I moved to Europe :frowning:

I wish Framework did debugging like ChromiumOS, CCD + SuzyQuble.

Worst case I can simply use my PCBite to probe the 8 PINs and write that way, do you know if flashrom (with Tigard @ 1.8v) supports the specific IC?

Currently I’m only really interested in replacing the Trust Anchor so I can sign my own CAP updates, you mentioned AMD’s ROM Armor, replacing the cert should get around this?

Ah AMD’s ROM Armor is AmdPspRomArmor3Smm I think I can just yeet (NOP) this then write freely from the OS if I wanted easier access to the IC.

They do.

1 Like

This is fab, I expect to make changes to my EC (Lilac Branch) and wanted to know if recovery is possible, so this is great.

I’ve always wanted to have PON work with a key-combo, does this CCD grant you the ablitiy to re-flash the EC as well? Chromium devices have this but I think that happens over a different IC (Titan before that was a thing)

Coming in to this thread after trying to get my Framework 16 AMD to display to external monitors on boot.

I’m not as in-depth a developer as others in this thread, but I’ll test where I can. What’s the best way an end user can contribute?

@Dylanger
I work on EC code.
I have a safe way to experiment on it that loads the test EC code into the RW image and then sysjump to it.
If the code fails, or you power off, it reverts to the RO image.
My instructions for doing it are here:

Note, compile with “zmake build lilac” instead of lotus for your lilac mainboard.

With the opensil dev work on the FW16.
It would be nice to have a BIOS dual image, with ways to fallback to known good image.
Is there enough flash space for dual image ?

Yes, but it’s unclear how this works on AMD.

SPI flashes on Intel platforms are memory-mapped to 16MB, so BIOS region cannot overlap that region (but if you use upper 16MB for IFD/ME/GBE and lower 16MB for BIOS regions, that works fine).

PSP definitely supports A/B recovery scheme, but I don’t know how this works or relates to x86_64 firmware without documentation. When I try to allocate more than 16MB coreboot can’t find CBFS so this is either a bug in coreboot, or AMD is also memory-mapped.

Another possibility is of course a bug in OpenSIL, Phoenix PoC is in such a terrible state that the more we work on this (my coworker stepped up to help), the more bugs we find.

For example, we finally found and fixed the issue with missing APIC, we now have both (NBIO and FCH) APICs present:

[DEBUG] CPU_CLUSTER: 0 init finished in 1230 msecs
[DEBUG] DOMAIN: 00000000 init
[DEBUG] IOAPIC: Initializing IOAPIC at fd180000
[DEBUG] IOAPIC: ID = 0x01
[DEBUG] IOAPIC: 32 interrupts
[DEBUG] IOAPIC: Clearing IOAPIC at fd180000
[DEBUG] DOMAIN: 00000000 init finished in 16 msecs
[DEBUG] PCI: 00:00:01.0 init finished in 0 msecs
[DEBUG] PCI: 00:00:02.0 init finished in 0 msecs
[DEBUG] PCI: 00:00:03.0 init finished in 0 msecs
[DEBUG] PCI: 00:00:04.0 init finished in 0 msecs
[DEBUG] PCI: 00:00:08.0 init finished in 0 msecs
[DEBUG] PCI: 00:00:14.3 init
[DEBUG] RTC Init
[DEBUG] IOAPIC: Initializing IOAPIC at fec00000
[DEBUG] IOAPIC: ID = 0x00
[DEBUG] IOAPIC: 24 interrupts
[DEBUG] IOAPIC: Clearing IOAPIC at fec00000
[DEBUG] IOAPIC: Bootstrap Processor Local APIC = 0x00
[DEBUG] PCI: 00:00:14.3 init finished in 24 msecs
[DEBUG] PCI: 00:00:14.6 init finished in 0 msecs
[DEBUG] PCI: 00:01:00.0 init finished in 0 msecs
[DEBUG] PCI: 00:02:00.0 init finished in 0 msecs
[DEBUG] PCI: 00:03:00.0 init finished in 0 msecs
[DEBUG] PCI: 00:03:00.1 init finished in 0 msecs
[DEBUG] PCI: 00:03:00.2 init finished in 0 msecs
[DEBUG] PCI: 00:03:00.6 init finished in 0 msecs
[DEBUG] PCI: 00:04:00.0 init finished in 0 msecs
[DEBUG] PCI: 00:05:00.0 init finished in 0 msecs
[INFO ] Devices initialized

[…]

ACPI Debug: “PCI: _SB.INTA._STA: 000000000000001F, Disabled”
ACPI Debug: “PCI: _SB.INTA._PRS => APIC”
ACPI Debug: “PCI: _SB.INTA._CRS APIC: 000000000000001F”
ACPI: PCI: Interrupt link INTA configured for IRQ 0
ACPI: PCI: Interrupt link INTA disabled
ACPI Debug: “PCI: _SB.INTA._DIS APIC”
ACPI Debug: “PCI: _SB.INTB._STA: 000000000000001F, Disabled”
ACPI Debug: “PCI: _SB.INTB._PRS => APIC”
ACPI Debug: “PCI: _SB.INTB._CRS APIC: 000000000000001F”
ACPI: PCI: Interrupt link INTB configured for IRQ 0
ACPI: PCI: Interrupt link INTB disabled
ACPI Debug: “PCI: _SB.INTB._DIS APIC”
ACPI Debug: “PCI: _SB.INTC._STA: 000000000000001F, Disabled”
ACPI Debug: “PCI: _SB.INTC._PRS => APIC”
ACPI Debug: “PCI: _SB.INTC._CRS APIC: 000000000000001F”
ACPI: PCI: Interrupt link INTC configured for IRQ 0
ACPI: PCI: Interrupt link INTC disabled
ACPI Debug: “PCI: _SB.INTC._DIS APIC”
ACPI Debug: “PCI: _SB.INTD._STA: 000000000000001F, Disabled”
ACPI Debug: “PCI: _SB.INTD._PRS => APIC”
ACPI Debug: “PCI: _SB.INTD._CRS APIC: 000000000000001F”
ACPI: PCI: Interrupt link INTD configured for IRQ 0
ACPI: PCI: Interrupt link INTD disabled
ACPI Debug: “PCI: _SB.INTD._DIS APIC”
ACPI Debug: “PCI: _SB.INTE._STA: 0000000000000014, Enabled”
ACPI Debug: “PCI: _SB.INTE._PRS => APIC”
ACPI Debug: “PCI: _SB.INTE._CRS APIC: 0000000000000014”
ACPI: PCI: Interrupt link INTE configured for IRQ 20
ACPI Debug: “PCI: _SB.INTE._DIS APIC”
ACPI Debug: “PCI: _SB.INTF._STA: 0000000000000015, Enabled”
ACPI Debug: “PCI: _SB.INTF._PRS => APIC”
ACPI Debug: “PCI: _SB.INTF._CRS APIC: 0000000000000015”
ACPI: PCI: Interrupt link INTF configured for IRQ 21
ACPI Debug: “PCI: _SB.INTF._DIS APIC”
ACPI Debug: “PCI: _SB.INTG._STA: 0000000000000016, Enabled”
ACPI Debug: “PCI: _SB.INTG._PRS => APIC”
ACPI Debug: “PCI: _SB.INTG._CRS APIC: 0000000000000016”
ACPI: PCI: Interrupt link INTG configured for IRQ 22
ACPI Debug: “PCI: _SB.INTG._DIS APIC”
ACPI Debug: “PCI: _SB.INTH._STA: 0000000000000017, Enabled”
ACPI Debug: “PCI: _SB.INTH._PRS => APIC”
ACPI Debug: “PCI: _SB.INTH._CRS APIC: 0000000000000017”
ACPI: PCI: Interrupt link INTH configured for IRQ 23
ACPI Debug: “PCI: _SB.INTH._DIS APIC”

However, Linux hangs while initializing the USB controller:
pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window]
pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window]
pci_bus 0000:00: resource 6 [mem 0xf0000000-0xfebfffff window]
pci_bus 0000:00: resource 7 [mem 0x8a0000000-0xfffcffffffff window]
pci_bus 0000:00: resource 8 [mem 0x000a0000-0x000bffff window]
pci_bus 0000:01: resource 1 [mem 0xf0300000-0xf03fffff]
pci_bus 0000:01: resource 2 [mem 0xf0200000-0xf02fffff 64bit pref]
pci_bus 0000:02: resource 1 [mem 0xf0400000-0xf04fffff]
pci_bus 0000:06: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:06: resource 1 [mem 0xf0600000-0xf07fffff]
pci_bus 0000:06: resource 2 [mem 0x8b0a00000-0x8b0bfffff 64bit pref]
pci_bus 0000:07: resource 0 [io 0x3000-0x3fff]
pci_bus 0000:07: resource 1 [mem 0xf0800000-0xf09fffff]
pci_bus 0000:07: resource 2 [mem 0x8b0c00000-0x8b0dfffff 64bit pref]
pci_bus 0000:03: resource 0 [io 0x1000-0x1fff]
pci_bus 0000:03: resource 1 [mem 0xf0000000-0xf01fffff]
pci_bus 0000:03: resource 2 [mem 0x8a0000000-0x8b09fffff 64bit pref]
pci_bus 0000:04: resource 1 [mem 0xf0a00000-0xf0afffff]
pci_bus 0000:04: resource 2 [mem 0x8b0e00000-0x8b0efffff 64bit pref]
pci_bus 0000:05: resource 1 [mem 0xf0b00000-0xf0cfffff]
pci 0000:03:00.1: D0 power state depends on 0000:03:00.0
pci 0000:00:08.3: enabling device (0000 → 0002)
pci 0000:05:00.3: enabling device (0000 → 0002)
W: g.applet.interface.uart: 9074 frames dropped due to frame/parity errors
W: g.applet.interface.uart: 12174 frames dropped due to frame/parity errors
W: g.applet.interface.uart: 12174 frames dropped due to frame/parity errors
W: g.applet.interface.uart: 12174 frames dropped due to frame/parity errors

You would think that you would be able to simply disable USB controller for now but… no, because OpenSIL ignores configuration passed to it and brings the controller up anyway. That’s not even factoring all issues with sconfig etc.

If I had to summarize this project in one meme, it would be:

5 Likes

Oh my. Thank you for your work. I hope things get better with it.

I would recommend skipping the usb controller for now if you can.
One of them is very fragile. E.g.
“sudo lsusb -v” crashes one of the usb 3.2 chip and it disappears from the bus.
Other people have found passing some particular patterns of data through them also crashes it.
You can blacklist the pcie device in linux, so it never enables it.
It is only the USB 3.x device that is problematic. It would still leave USB 2.0 and USB4/Thunderbolt devices working. So, external keyboards and mice would still work from USB 2.0

In slower time, one might be able to reverse engineer the USB3.x bring-up, and see which particular setup message / config message causes it to fail, and find a work around.

If the problem is coreboot/opensil trying to enable it, you would probably need to check that it does init using the exact order of commands over the pcie bus as the existing amd bios does, too avoid a crash.
How does the crash show itself. Does the cpu force a reboot with sync-flood S5_RESET_STATUS ?

Are you saying that the opensil code quality is not great? At least it is open source, so users will improve that over time like you are.

I have done some work with the chromium EC firmware for my FW16. It was doing some crazy stuff when i first saw it. I pointed out the problems and it is much improved now.
An example of the craziness:
It had 3 different functions for controlling whether to charge / discharge the battery. All 3 functions would be run at various times.
All 3 functions used a different algorithm, so one would charge it, the next would decide to discharge it, the 3rd would charge it again, resulting in a lot of charge/discharge flipping.

  1. it should have had just one algorithm.
  2. it should have had just one function.

It actually caused FW13 batteries to fry and swell themselves.
The EC battery code is fixed now, but there are still many bugs. Another one i found yesterday regarding unreliable i2c transfers.
Interestingly, the EC that FW use, is from chromium, so from google. It is the google originated code that is so bad, so cannot blame FW really for that.

How easy would it be to port it to the FW 13 afterwards? I imagine it must be pretty similar? Since you seem to be going pretty quickly so far

I don’t think it would be too difficult to port the FW16 to FW13. Reason being, some people, on the forum, managed to flash FW16 bios to FW13 and it still booted. The EC and usb stopped working, but the os booted. The problem was when they tried to flash it back to FW13, the Bios rejected it as it thought it was a FW16 now. So RMA happended. Even though the usb did not work, they tried removing the nvme, loading the correct bios onto it, then try to flash from that.

Oh wow really? I’m surprised you could go one way but not the other, I wonder why, maybe they forgot to check for the FW13 since it was first? Either way that’s good news, I almost wanted to buy a FW13 chromebook just for Coreboot before! (Too bad it’s old now)

For disabling the xhci controller, since OpenSIL really wants to initialize it, what about turning it off (D3 cold) after OpenSIL runs?

That also didn’t help, but we got a bit further (hitting xhci handoff).

I tried patching Linux to disable it, but my workstation crashed while compiling the kernel… and it looks like my luck with daily-driving pre-production mobile SoC on Chinese mATX motherboard ran out (spent ~3 days debugging it, silicon degradation progressed to the point where it’s unable to train memory after I re-flashed coreboot) after 3 years.

My coworker continues hacking on it, but I won’t be able to do much this week as I need to order a new motherboard and CPU for my workstation and essentially re-build it (fastest system I currently have access to is a Samsung Galaxy Chromebook (Google/Hatch/Kohaku), which would turn into a nuke if I would attempt to compile anything on it thanks to fanless design).

At least I can still use my RAM sticks (32GB DDR4) on 14Gen Intel CPUs so that’s likely what I’ll go with :woman_shrugging:

3 Likes

I don’t think you need to compile your own Linux kernel.
Just put this in the kernel boot parameters:
pci-stub.ids=<vendor>:<device>
or
vfio-pci.ids=<vendor>:<device>

This prevents the normal device drivers from binding to it.
The parameter was originally used to ensure no host device driver binds to PCIe device, so that it can be passed through to a VM.
But, the same command can be used for just disabling a device, which is what we need here.

So, for the FW16 7040 Series this would disable the xhci driver in Linux:
vfio-pci.ids=1022:15b9,1022:15ba,1022:15c0,1022:15c1

but, you might only need to disable one of them, if only one is problematic:
vfio-pci.ids=1022:15c0

Your previous posts did not make it clear which PCIe device IDs it was having problems with.

I think the PCIe bus listing on the opensil is wrong.
What is device ???:
pci 0000:05:00.3: enabling device (0000 → 0002)

I looks to me that the pci_bus 0000:07 (from your output above) is not a PCI device ID?
It might be helpful to try to update the opensil PCI numbering to match the FW BIOS.

The FW16 7840HS with the FW BIOS 4.0.4 has the following devices:

 lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host Bridge
00:04.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Internal GPP Bridge to Bus [C:A]
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Internal GPP Bridge to Bus [C:A]
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Internal GPP Bridge to Bus [C:A]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric; Function 7
01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless Network Adapter
02:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN850X NVMe SSD (rev 01)
c1:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev c2)
c1:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Radeon High Definition Audio Controller
c1:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Phoenix CCP/PSP 3.0 Device
c1:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b9
c1:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15ba
c1:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 63)
c1:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Ryzen HD Audio Controller
c2:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Function
c2:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] AMD IPU Device
c3:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Function
c3:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15c0
c3:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15c1
c3:00.5 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller #1
c3:00.6 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine USB4/Thunderbolt NHI controller #2

What’s the computer/SoC you’re using that’s failing? I’m curious

It was fun while it lasted, although it was incredibly cursed (i.e: vendor forgot to place a logic gate between 5V and 5VSB to keep RAM powered during suspend, so when I attempted to suspend the system it would lose RAM content in S3), ASPM wasn’t working so my system was using ~50W of power in idle (though 30W of that was an AMD GPU).

But hey, back then it cost me 160EUR and had performance equivalent to 13Gen i5 (for which the motherboard itself cost ~180EUR at the time).

All things considered, I’m impressed it lasted 3 years of (ab)use. It was on 24/7, compiling stuff everyday, technically running out-of-spec (65W PL1/105W PL2). All of that on pre-production, laptop CPU slapped onto mATX motherboard.

Definitely would not recommend though, tried to get schematics from that company to fix ASPM and place a logic gate to fix S3 but they refused. Quality was absolutely terrible as well (VRMs would cook themselves if I wouldn’t have had a fan blowing directly at MOSFETs, M.2 stand-offs were places incorrectly, had to re-solder one of M.2 slots because it was soldered so poorly that tension applied by an NVME drive made it come off the board).

Dealing with it made me want to design my own x86_64 board, even though I realize it’s a massive rabbit hole I’d likely wouldn’t have time for (calculating trace length for signal integrity takes especially long time, you often see pre-production laptop boards floating around with broken thunderbolt because trace length was too mismatched and it fails link training etc.).

Regardless, I received help from friends and ordered an LGA1700 DDR4 board (that doesn’t have BootGuard) and i9-14900K which should be ~50% faster so at least it’s a nice upgrade.

Will try to nag my coworkers to fix builder at work (last time I tried to use it I ran into cgroup weirdness) so I can still work on this in the meantime, even from a Chromebook that I mentioned above (which is my daily-driver on the go). Will plug Framework 16 (DUT) into my old laptop (ThinkPad X230) so it can be reflashed remotely. Basically building stuff using work builder, scp’ing it onto X230 in my house and then getting UART/power control by SSHing into X230. Won’t be ideal, but should work :wink:

5 Likes

The coreboot port will continue until morale improves :rofl:

thanks again for your work, super excited to see the final result.

5 Likes

You just gotta love those chinesium salvage boards but they really do always have some quirks. I wish my amd gpu idled at just 30W (ok fin it kinda did that before I got a 280hz monitor, no it’s just 100W all the time).

Part of me says do it, the other part says not worth it, would be pretty cool though. But yeah laying out memory and pcie traces would be hell, especially without the tools and arcane hard won tricks the industry has.