We will take a look at this issue and see what is going on.
Keyboard events go through a separate interface to the ACPI interface.
However the EC will disable all function keys if the OS is not in ACPI mode so if all function keys stop working, including the top row, this may be the cause.
The EC needs a host command from the bios on boot to enable function keys.
This is done in the host_command_customization.c
sci_enable(void)
The keyboard events are going normally through the 8042 interface, it’s just that the scancodes (and behaviour) are wrong.
I’m pretty sure ACPI is enabled, since this happens in the middle of using an otherwise completely normally functioning laptop (including features that I believe require ACPI, like suspend-resume). And so “Fn key stops working” does literally mean “while using the laptop”, and not “after some boots/resumes”.
Is there a way to check/confirm that Linux is still in ACPI mode?
I’ve looked at sci_enable, the problem appears to be that bit 0 of byte 0 of the “customer_memmap” region. The call to pos_get_state() in keyboard_scancode_callback() means that, when the Fn key is working, we can infer that the bit must be set. Which I think suggests that ACPI has been enabled and probably sci_enable has properly run (gone through its main path)?
But if that bit has somehow been cleared, then that could explain the symptoms. But also if factory_enable has somehow been set, then that could also explain it. My outsiders perspective detailed analysis is back on this earlier comment, in case it helps you get started…
Previously, I tried increasing the frequency of how often ectool pwmgetfanrpm is run, to see if it affected the issue (it didn’t, or not noticably). Today I tried the opposite test - competely disabling those calls. I spent 90 minutes in a physical environment / situation where the problem has been readily reproducing for me, and everything was fine. I then re-enabled the ectool pwmgetfanrpm calls (which run every ~0.5 secs), without changing anything else (including the environment / situation), and within 10 minutes the problem had re-occurred and the Fn key was no longer working.
Based on this I’m now pretty convinced that, at least in my case, the ectool usage is causally related. At least one commenter previously said that they weren’t using ectool, but perhaps some other EC comms (eg. by one of the cros_ec* kernel modules?) can cause the problem (perhaps with a lower frequency / incidence).
I think this also correlates with all the dmesg errors I’m sometimes seeing about EC message checksum/packet length errors and ectool segfaults (though I don’t understand why I only sometimes get dmesg complaints about raw port io being locked down…). I’m pretty much back to my original hypotheses: an ectool command request getting mangled such that the EC ends up running 0x3E02 (EC_CMD_FACTORY_MODE) with non-zero arg, or 0x3E07 (EC_CMD_CUSTOM_HELLO), or, something else is causing *host_get_customer_memmap(0x00) & BIT(0) to be cleared or factory_enable to be set (eg. buffer overrun somewhere in the EC?).
I’m using the ectool from DHowett/framework-ec@d5b5b50. Is there a better/official version that I should use instead? For now I’m leaving the ectool fan reporting disabled, and will see if the problem still happens but less frequently…
Ok, I’ve confirmed that it can happen even without using ectool, it just takes a lot longer to occur. Since it happens without ectool, but more frequently with it, it seems unlikely to be something that the cros_ec_lpcs (or whatever) module is doing, and is more likely to be something on the EC side?
This has been happening to my 11th Gen (Gen 1, batch 1) for months. I’m a fairly low-speed, high-drag user running pop!OS, and I see it only in between sleeps. The only way I clear the error is with a reboot. I have not been bothered enough to do any real troubleshooting, but the more days between reboots, the more likely I will see it.
I’ve had the issue yesterday, maybe I should’ve posted here instead in the other thread. Now I’ve rebooted, so it’s gone away, but the dmesg should still be of interest:
I’ve also been seeing this issue recently, on an 11th gen running Ubuntu 22.10.
I’m pretty sure this never happened when I first got the laptop a year ago. I believe I’ve first seen this happen about a month ago, maybe a bit more. Looking back, it could have been a BIOS update (I updated to 3.17 in january, though I think I first saw the issue one or two months after that), of maybe an Ubuntu upgrade (I upgraded from 21.something to 22.10 one or two months ago).
As for ectool - I have a version compiled and installed that I have been using to manually check the ec console every now and then, but I’m not using ectool periodically for anything, so that cannot be the cause for me.
I’m also seeing the packet too long errors (and saw bad checksum once) in dmesg.
As for when this happens - I have the feeling it is related to suspending or usb-c docking/undocking, but I also realize there is a bias there - when I’m docked, I use a different keyboard, so it might very well be broken already when I dock, but I won’t notice until I undock, suspend my laptop and then notice when I unsuspend again when I’m on the road.
There is definitely something going on regarding uptime. I recently rebooted after ~70 days. Before doing that, I reinstated my ectool polling, and the problem easily re-occurred in under 10 mins. However, after rebooting, with the same setup, I’m unable to make the problem re-occur, even after running the same polling for several hours.
The main difference that I’m aware of (apart from how long since the system had been rebooted) is a slightly more recent Ubuntu 22.04 kernel: I’m now on 6.0.0-1014-oem, whereas before it was 6.0.0-1010-oem. I guess another difference is that the system is generally more lightly loaded than before (less processes, cpu, and memory currently in use). Over time I’ll see if the problem eventually reoccurs on this slightly more recent kernel version.
This also started happened to me on Ubuntu 22.10 shortly after updating firmware from 0.0.3.4 to 0.0.3.17. I’ve downgraded to 0.0.3.10 and will report back if this seems to resolve the issue.
Everything seemed to be going well with the firmware downgrade to 0.0.3.10, but a few days ago the fn key died again and I had to reboot to bring it back I’m back on 0.0.3.17 again and seeing the same behavior occasionally.
As this is happening on Ubuntu as well, anything interesting happen before this took place? Suspend, repeated lid opening and closing? I’ve been actively tracking an 11th gen issue that I have not been able to replicate, however if related, I have a workaround that may stop the behavior.
Let me know if suspend was done previous to this happening.
Yes, for me on Ubuntu, this is always correlated with a resume from a suspend. It doesn’t happen every time, but it will generally happen at least once every 5-7 days (and then I reboot to fix it). Prior to this starting (potentially with the Ubuntu 22.10 upgrade?), I had gone a year without any problems whatsoever.
By the way, I just want to precise that this happens on 12th gen.
I didn’t know that this thread was 11th gen only:
I switched from 11th-gen to 12th-gen some time last year, and the issue still happens from time to time.
Looks to be different. 11th gen users were seeing this happen with some suspend/resume activity and it looks like your seeing this with Fn lock after unplugging a TB dock.
@Matt_Hartley thanks for the tip about blacklisting the cros_ec_lpcs module, I’m trying that now. It still seems to get loaded even with the modprobe.d file:
chris@vega:~ ☸ home in 112ms @ 20:34:32 $ cat /etc/modprobe.d/no_cros_ec.conf
# Blacklists a kernel module that causes the Fn key to stop working
# https://community.frame.work/t/tracking-fn-key-stops-working-on-popos-after-a-while/21208/32
blacklist cros_ec_lpcs
chris@vega:~ ☸ home in 109ms @ 20:35:42 $ lsmod | grep cros_
cros_usbpd_charger 20480 0
cros_usbpd_logger 20480 0
cros_usbpd_notify 20480 1 cros_usbpd_charger
cros_ec_debugfs 16384 0
cros_ec_chardev 16384 0
cros_ec_sysfs 16384 0
cros_ec_dev 16384 0
cros_ec_lpcs 16384 0
cros_ec 20480 1 cros_ec_lpcs
Do you have any advice on a stronger way to blacklist it?
Ah update on the modprobe.d solution: after adding a modprobe configuration file, you also have to run sudo update-initramfs -u to make it take effect on the next boot. I’m now running without those cros_ modules and will report back after a week or so.