Issues with getting a PCIe device detected using v4 BIOS on the 7940HS

Ever since trying to upgrade to any v4.XX BIOS on my 7940HS and trying to use the OCuLink 8i board, I keep having my GPU not be detected at all. I decided to open a different post in order to stop filling the threads from other projects. We previously talked in the MXM board thread.

To summarize, I upgraded to v4.03 again in order to do more testing. I then ran dmem 0xfed815a0 4 -MMIO in EFI Shell, which only returned 0x0000E500. Then I rebooted into my Arch Linux install and ran the script from this post Framework 16 to MXM Gpu - V0.1 Prototype design - #246 by James3 which returned me 0x00A40000 that should hopefully be correct? The GPU is still not detected within Windows 11, but I decided to just check lspci output:

01:00.0 VGA compatible controller: NVIDIA Corporation AD104 [GeForce RTX 4070] (rev a1)
01:00.1 Audio device: NVIDIA Corporation AD104 High Definition Audio Controller (rev a1)

And the GPU was literally there (its nowhere to be seen in Device Manager when I switch to Windows 11), but I couldn’t use nvidia-smi due to:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I then decided to check dmesg:

[    4.479214] nvidia: loading out-of-tree module taints kernel.
[    4.479222] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    4.566784] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
[    4.569421] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[    4.571510] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:2786) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[    4.571566] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[    4.572066] NVRM: The NVIDIA probe routine failed for 1 device(s).
[    4.572068] NVRM: None of the NVIDIA devices were initialized.
[    4.573231] nvidia-nvlink: Unregistered Nvlink Core, major device number 511

And then a bit later

[    4.989955] nvidia-nvlink: Nvlink Core is being initialized, major device number 508

[    4.990171] i2c i2c-20: Successfully instantiated SPD at 0x50
[    4.993170] i2c i2c-20: Successfully instantiated SPD at 0x51
[    4.993303] piix4_smbus 0000:00:14.0: Auxiliary SMBus Host Controller at 0xb20
[    4.993958] i2c i2c-22: Successfully instantiated SPD at 0x50
[    4.994451] i2c i2c-22: Successfully instantiated SPD at 0x51
[    4.994621] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[    4.995048] pci_bus 0000:66: busn_res: [bus 66] is released
[    4.995524] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:2786) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[    4.995579] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[    4.995694] pci_bus 0000:67: busn_res: [bus 67] is released
[    4.996315] NVRM: The NVIDIA probe routine failed for 1 device(s).
[    4.996318] NVRM: None of the NVIDIA devices were initialized.
[    4.996554] pci_bus 0000:68: busn_res: [bus 68] is released
[    4.996697] pci_bus 0000:69: busn_res: [bus 69] is released
[    4.996918] pci_bus 0000:6a: busn_res: [bus 6a] is released
[    4.997034] pci_bus 0000:65: busn_res: [bus 65-6a] is released
[    4.997213] nvidia-nvlink: Unregistered Nvlink Core, major device number 508

Then something regarding Intel?

[    5.061085] snd_hda_intel 0000:01:00.1: Unable to change power state from D0 to D0, device inaccessible
[    5.062675] cros-charge-control cros-charge-control.4.auto: Framework charge control detected, preventing load
[    5.064864] snd_hda_intel 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible
[    5.065635] snd_hda_intel 0000:01:00.1: Disabling MSI
[    5.065644] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[    5.206657] hdaudio hdaudioC1D7: no AFG or MFG node found
[    5.207080] snd_hda_intel 0000:01:00.1: no codecs initialized
[    5.208033] snd_hda_intel 0000:01:00.1: GPU sound probed, but not operational: please add a quirk to driver_denylist

And then yet again this

[    5.312256] nvidia-nvlink: Nvlink Core is being initialized, major device number 508

[    5.312643] intel_rapl_common: Found RAPL domain package
[    5.313352] intel_rapl_common: Found RAPL domain core
[    5.313633] mt7921e 0000:03:00.0: WM Firmware Version: ____000000, Build Time: 20251118163234
[    5.313811] amd_atl: AMD Address Translation Library initialized
[    5.314524] input: HD-Audio Generic HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:08.1/0000:c3:00.1/sound/card2/input40
[    5.315600] nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[    5.316259] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:2786) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[    5.316302] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[    5.317288] NVRM: The NVIDIA probe routine failed for 1 device(s).
[    5.317290] NVRM: None of the NVIDIA devices were initialized.
[    5.317840] nvidia-nvlink: Unregistered Nvlink Core, major device number 508
[    5.320166] snd_pci_ps 0000:c3:00.5: enabling device (0000 -> 0002)

It seems like it just cannot communicate with the GPU for some odd reason? But like I said, I have no issues with BIOS v3.07, so it feels like it might be something specific to v4.XX.

Ok, so some progress.
If the lspci worked as some point, the PCIe lanes have trained up and transferred the content that is returned by lspci.
I.e. “VGA compatible controller: NVIDIA Corporation AD104 [GeForce RTX 4070] (rev a1)”
So the PCIe lanes were up, but the dmesg log appears to show them disappearing again.
So, my guess here would be problems with signal quality.

Maybe try “sudo lspci -vv” and look for the LnkSta lines.
That will give an idea as to what speeds it is managing to link train at.
The PCIe lanes are PCIe Gen 4, but you can force them down to PCIe Gen 3 simply by unplugging the power adapter from the FW16 laptop.

Unfortunately it does not contain any relevant info about the link speed:

01:00.0 VGA compatible controller: NVIDIA Corporation AD104 [GeForce RTX 4070] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Gigabyte Technology Co., Ltd Device 40ee
	!!! Unknown header type 7f
	Interrupt: pin ? routed to IRQ 144
	IOMMU group: 14
	Region 0: Memory at 90000000 (32-bit, non-prefetchable) [size=16M]
	Region 1: Memory at 7c00000000 (64-bit, prefetchable) [size=16G]
	Region 3: Memory at 8000000000 (64-bit, prefetchable) [size=32M]
	Region 5: I/O ports at a000 [size=128]
	Expansion ROM at 91080000 [disabled] [size=512K]
	Kernel modules: nouveau, nvidia_drm, nvidia

01:00.1 Audio device: NVIDIA Corporation AD104 High Definition Audio Controller (rev a1) (prog-if 00 [HDA compatible])
	Subsystem: Gigabyte Technology Co., Ltd Device 40ee
	!!! Unknown header type 7f
	Interrupt: pin ? routed to IRQ 47
	IOMMU group: 14
	Region 0: Memory at 91000000 (32-bit, non-prefetchable) [size=16K]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

Unfortunately even this does nothing. I did check that the BIOS setting is enabled to switch to gen 3 on battery, but it still results in the same output when unplugged and I reboot.

That is it failing to read the PCIe configuration space. So, again, problems with the PCIe link training or problems with the PCIe link signal quality.

But why does this happen only with a different BIOS version? I’ve never had any issues with v3.07

I don’t know regarding the BIOS.
You can try unbinding the driver and binding it again, to try to force a reset.
If I was diagnosing this, I would put the vector scope on and check the eye diagrams.

Here is lspci -vv on BIOS v3.07 when I run

__NV_PRIME_RENDER_OFFLOAD=1 __GLX_VENDOR_LIBRARY_NAME=nvidia glmark2

to put a load on it, so it kicks up to PCIe x8 4.0 (can also confirm it within CPU-X)

It is indeed going up from 2.5 GT/s to 16 GT/s that PCIe 4.0 gives when I put a load on it. I don’t know what even makes it not want to work with newer BIOS.

nvidia-smi also gives me a working regular output:

Sat Feb 14 11:26:10 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.48.01              Driver Version: 590.48.01      CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   40C    P0             51W /  200W |      31MiB /  12282MiB |     94%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           66379      G   glmark2                                   8MiB |
+-----------------------------------------------------------------------------------------+

I would really appreciate help from the Framework team on this one, as its puzzling.

I tried an OCuLink cable thats half the length of the one I usually use and it gives me the same result on v4 BIOS, but I still highly doubt its related to the board or cable since I get the full 4.0 bandwidth on v3.07 with both cables. It seems like something just broke or I’m doing something wrong for v4 BIOS.

I am unsure if this is a hardware or firmware issue at this point since its just odd… How am I getting full 4.0 link with one version and then the link just completely fails on another version?

What’s even weirder is that a GPU on the other end is detected for a tester with a 7840HS, so it might be a 7940HS specific issue perhaps? It did require doing some BIOS resets and battery disconnects (the GPU just wasn’t being detected, even though it was fully working on v3.07, so there’s still something funky going on where it just “forgets” that a PCIe device is connected) but they did get it running after repeating multiple times. I did attempt to do it as well, but to no avail.