I have done a bit more debugging on the reset signal and have narrowed it somewhat down.
I have just gotten a 3.0 x1 to SATA PCIe adapter card that I was about to put in my home server, but got the idea to try it out. It booted with PEX_RST# without an issue. The GPU does not want to work at all still, so this behavior is baffling to me.
Then I did some testing by measuring the reset signal as it is supposed to go up high (3.3V) when power and REFCLK are stable and stay that way until power down.
When I have the GPU connected, the reset signal is 3.3V only while I am in BIOS initialization or enter BIOS settings. The GPU ramps up its fans and turns them off like it usually does on a desktop. So the GPU is 100% detected during BIOS initialization.
Next, the firmware/BIOS initialization is finished and the laptop starts booting into an OS (tried both Linux and Windows). The reset signal suddenly drops to 0V and never goes back up. This is exactly why the GPU is not detected at all, since the signal going low literally tells the GPU to go into a “reset” state.
The reason why SSD1_RST worked on my end, is that it is constantly at around 1.8V while using x8 configuration on the EEPROM for some reason. My guess is that my RTX 4070 finds that fine, even though it is not at the correct high of 3.3V, and continues the PCIe link.
When I don’t have the GPU plugged in, the PEX_RST# signal keeps its high 3.3V even after BIOS initialization. It also stays high even with the SATA PCIe card I got, so that why that one was recognized fine at boot and even displayed the SATA drives I connected to it.
To also make sure that it is nothing odd with my GPU, I tested it with 3 different desktop motherboards and they all recognized it and output video from it.
So now it seems like there is some issue either in the firmware of the laptop when using a x8 card or there might be some pin that we are missing that we have to connect to tell the laptop to continue sending the PEX_RST# signal after firmware initialization. I am leaning more towards the latter since the dGPU modules use x8 and I assume they also rely on PEX_RST#.
Oh right, I forgot to mention the most important part. When I configured the EEPROM to do 4x2 or 4x1 PCIe, PEX_RST# worked fine even with the GPU (but of course limited to x4 speeds), so it really is some odd issue with x8 configurations. You can give that a try as well.
I did already inform Framework about this, so I do hope they will be able to share some more info.
The FW16 Schematics are not detailed enough to tell us where the PEX_RST# pin on the inter-poser connects to on the mainboard.
If we had more details, we could maybe write a simple program to set it high or low as needed.
I could not see anything obvious in the EC source code indicating a GPIO for it. There are various GPIOs on the EC for GPU_PWR_EN etc.
In the FW16 Schematics that are there, there are two outputs from the mainboard, that are then AND together to make the DGPU_PEX_RST# on connector pin A5.
APU_PCIE0_RST# (AND) PEX_RST# == DGPU_PEX_RST#
Maybe you could ask FW support how to write a simple program to set the DGPU_PEX_RST# high or low.
If you have a EC CCD. Maybe show the output of “gpioget” for when it is in the BIOS and PEX_RST# is high, and again when the OS is booted, and PEX_RST# is low.
One might see if there are any EC gpio differences.
SSD_RST shouldn’t be necessary at all when doing x8 according to Framework. But yes, please do try to set it to 4x1 and see if it changes anything as it might work then.
I’m actually not sure, but I’m pretty sure it can’t be over PCIe? I’ve got no idea how that would work. I just assumed it would be over the I2C link or one of the reserved pins from manufacturer to manufacturer.
I am actually so confused right now. I went ahead and installed the board that is connected to PEX_RST once again to just look at the EC logs during boot. The GPU just decided to connect to the laptop at x8 now… I’ll share the logs nevertheless although I dont know how much they’ll help.
I think the eeprom from the expansion card is read once when the EC boots.
So, each time you change the eeprom, you probably need to reset the EC.
EC Reset procedure:
Power off laptop. (not standby)
Unplug PSU.
wait 60 seconds.
Power on laptop
Plug in PSU.
“ectool console” and check that the timestamps are close to zero. The timestamps are timestamps since the EC booted.
From @Kieran_Levin message earlier, the APU controls the PEX_RST GPIO, and not the EC.
The CPU / APU has 256 GPIOs, so we would need to know which one to toggle to control the PEX_RST.
I suppose you could put a voltmeter on the PEX_RST and experiment with APU GPIOs until one toggles it. I don’t know if that has the potential to damage the mainboard or not. So, it might be worth while to just wait until Kieran responds with which one it is.
I did run gpioget here, but since the APU controls it I bet there isn’t a whole lot of info. The first one is in BIOS settings, second immediately after when Windows starts booting and third is when windows fully booted. I did not have an EC CCD card from DHowett, but I did a DIY solution that does the same job over UART and a Pi Pico that sends the command immediately on boot.
But now it basically works, I have no idea how as I changed nothing. I just left the laptop off overnight, swapped the boards and tried it out.
I was dealing with this issue for months now and never had it work when connected to PEX_RST. So I do not know what exactly changed. I could only think that that caching behavior with the EEPROM read being the issue, but I did have the laptop off and disconnected for longer durations while I was testing so I am unsure what exactly happened.
I did do a complete BIOS reset, battery unplug and then another BIOS reset with that internal switch between the RAM and battery (press for 2 seconds 10 times) yesterday. But it still did not work then, so I am baffled how it suddenly started to work today.
Another interesting update. So I soldered the connection back to PEX_RST on the board I had connected to SSD1_RST. The GPU was not initialized yet again.
Then I decided to go into the BIOS and do a factory reset and battery disconnect followed by holding of the power button to drain anything in the system after saving the changes to the BIOS.
After plugging it into power, that board started to actually work as well. I do not know what to make of this exact behavior.
I did rollback to v3.07 yesterday, so I’m going to try going up to v4.03 now to see if stuff breaks yet again.
And now it broke yet again using 4.03 and 4.02. I did a BIOS reset and battery disconnect after every BIOS version change. After going back to 3.07 it started to work again.
EDIT: double checked this behavior to make sure it is not a one time thing.
Trying to test it with the Nvidia card and finding my framework strangely super slow. Any ideas what’s causing this? Seems like a power issue - it’s booting but is barely warm to the touch.
The lagging issue was weird, but seemed to be EEPROM related. It fixed itself once I cleaned up some of the joints. Unfortunately the Nvidia card hasn’t given any joy.
I’ve just bought a new MXM card though - known working. Onboard BIOS. Should arrive next week.