[RESOLVED] Kernel 6.4 hang at boot TPM Bug Confirmed

A friend of mine does not face this issue with his Framework running Intel 11th Generation. Could you please confirm, that it only affects Framework Laptop with 12th Generation CPUs?
EDIT: Sorry, it is working on 11th gen, but fails on 12th gen.

Same situation over here, also ArchLinux Kernel 6.4.1 on Framework Laptop 13 13th Gen.

Thx that you already figured out that the kernel update seems to be to causing the problem, I was questioning my sanity for a short while xD

1 Like

I bisected the issue and it is caused by commit e644b2f498d297a928efcb7ff6f900c27f8b788e

Building the kernel after git checkout v6.4 && git revert e644b2f498d297a928efcb7ff6f900c27f8b788e results in a kernel which is booting fine for me.
I am going to file a bug in the Arch Tracker.

6 Likes

FS#78961 - Kernel 6.4 failing to access TPM on Framework Laptop 12th

Would be interesting to know whether there are any differences between the TPM chip of 11th gen and 12/13th gen.

2 Likes

@eumpf Sorry, for my mistake. 12th generation seems to face the issue while 11th gen works fine. Could you please confirm again that for you face the issue with gen 13?
Could you please also send the output of this?

# dmidecode -s system-manufacturer 
# dmidecode -s system-version

Of course @TrailingEdge ! I can confirm that the same issue exists on the Framework Laptop 13 13th Gen with Archlinux default kernel 6.4.1.
I can also confirm that the Archlinux default 6.3.9 kernel does not suffer from this problem on this machine.

Here are the outputs you asked for:

# dmidecode -s system-manufacturer
Framework

# dmidecode -s system-version
A6

and some more:

# lshw -C bus -sanatize | head -n 7
*-core
       description: Motherboard
       product: FRANMCCP06
       vendor: Framework
       physical id: 0
       version: A6
       serial: [REMOVED]

# sudo lshw -C cpu -sanitize | head -n 6
*-cpu
       description: CPU
       product: 13th Gen Intel(R) Core(TM) i7-1360P
       vendor: Intel Corp.
       physical id: 4
       bus info: cpu@0

If there’s any way I can help (e.g. test kernel with the offending commit reverted on 13th gen) I’ll gladly do so.

BTW: during boot specifically pcrphase etc. my system did not hang completely it just took ages to before it continued (not sure if the systemd services timed out or were just reaaallly slow)

EDIT: The commit testing for TPM IRQs reminds me of these warnings I have noticed in dmesg (but I have always had them) :

tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead

Should anyone want to test this against Ubuntu Mainline RC of 6.4.* as a comparable, this would be extremely helpful. Would be interesting to see if this affected as well.

Same here! Tried adding LUKS + TPM2 but the enrollment was taking ages.
Switched on linux-lts kernel and enrolled in less than 5 seconds.

Can’t you just roll to an earlier kernel, 6.3#? A newer kernel is usually expected to have issues, so I don’t find this surprising.

If we just roll to an earlier kernel then we will find we have issues as soon as the kernel updates. It does not fix the problem when TPM is broken for newer kernels, it just means we need to raise attention with the kernel maintainers so they address the issue in the kernel.

3 Likes

Same issue here, Framework 12th Gen. TPM unlock works fine with kernel 6.3.9.

If you want to add details and raise attention on this bug, here is the link:
https://bugzilla.kernel.org/show_bug.cgi?id=217631

4 Likes

Adding tpm_tis.interrupts=0 to the kernel command line is a workaround for this issue (forces TPM back to polling).

(The job which hangs during boot on my 12th gen with kernel 6.4.2-arch1 is TPM2 PCR Machine ID Measurement, am repeating the name here to make it easier to find.)

One method to immediately boot and apply a workaround is to temporarily set “TPM Availability” to “Hidden” in the BIOS (plus temporarily disable Secure Boot if it’s enabled.)

Interestingly, my system booted without delay once out of a half-dozen tries, so the root cause is probably some kind of race condition.

8 Likes

Yep it works! Thanks a lot!!

Great tip - thanks!

Yes that kernel option works.

Though this is probably not an ideal situation. I was wondering if anyone had tried to use the Linux Clear Kernel as it has not updated to 6.4.1

Seems as though a patch was submitted upstream.
https://lore.kernel.org/regressions/20230710133836.4367-2-mail@eworm.de/

Incidentally the fix appears to be to force polling when running on Framework board.

2 Likes

They are matching DMI_PRODUCT_VERSION to A6, whereas on my 12th-gen it’s A8 (per the output of dmidecode), so I fear this patch would be insufficient…
Anyone in the loop to notify the devs?

Edit: Ooh I don’t know anymore, in a previous mail of this kernel thread, they have a patch with A4 instead… and say it’s for 12th-gen… weird…
I guess I am missing pieces of information to understand.

The only certainty is that on my machine it’s neither A4 nor A6, it’s A8, for whatever reason…

Edit 2: I added a comment on the Bugzilla page.

Merged two threads dealing with the same issue together. This seems to affect all distros, not just Ubuntu.

1 Like

I guess it depends what each of understands by “not an ideal situation”, but FWIW the workaround flag is doing the exact same thing that the proposed patches are:

  • Before kernel 6.4, the test for working TPM interrupts didn’t run correctly and the kernel would always fall back to polling mode and log [Firmware Bug]: TPM interrupt not working, polling instead.
  • In 6.4 the test was fixed to run , but there seems to be a regression where it reports a false-positive and the Framework TPM gets into a bad state when using interrupts.
  • Applying the command line option forces the TPM to disable interrupts and go back to polling mode.
  • The proposed patches add Framework models to the list of devices with known broken TPM interrupts, so the kernel will automatically select polling mode.

It’s possible that at some point whatever causes the “interrupt storm” on these laptops will be fixed and interrupts can be restored. But for now there’s not really any downside to explicitly disabling TPM interrupts on the command line, versus rolling back to an older kernel that disables TPM interrupts in a different way, or waiting for the next kernel which will disable TPM interrupts in a third way.

2 Likes