When starting the system, the FN key’s are working just fine. I can press Fn+Left and it triggers Home and so on.
But after some minutes or hours, the keys just stop working. Pressing Fn+Left just triggers Left, and all other special keys don’t work as well.
It then sometimes happens that the Fn keys start working again after a while. I didn’t find any pattern on how to reproduce this so far.
I just had this for the first time in one month of use.
After a reboot the Fn came back.
The machine is a i7-1165G7, I am using an Artix Linux distribution (Arch based, without SystemD), the kernel is 5.19.8-artix1-1.
I’ve also hit this problem a few times recently, and haven’t been able to figure out circumstances that cause it.
To fix it I normally close the lid to put the laptop to sleep, and then (after waiting 30 secs or so) reopen it to wake it up again, and then things are back to normal. I’ve never seen it spontaneously fix itself, except over a suspend-resume cycle (but then after noticing it, I also haven’t really waited very long).
I haven’t yet checked if Fn+Space works to control the keyboard backlight (or just emits a regular space character), but so far this looks very much like a low level firmware/EC problem. I use xev with Left vs Fn-Left (= Home, normally) to test if it’s happened, but next time I will use sudo evtest to check the lowest possible Linux level.
It’s also very weird because while the Left/Right/Up/Down/Delete keys will still output Left/Right/Up/Down/Delete even when Fn is held, the keys along the top of the keyboard will always output F1-F12 and never the special keys, regardless of if the Fn key is held or not! Pressing Fn-Esc (to enable/disable Fn-lock) doesn’t change anything.
I’m using a 12th gen Framework, with Ubuntu 22.04, kernel 6.0.0-1010-oem, BIOS 3.05. I have the Fn-swap option (ie. to swap the Fn and Ctrl keys) enabled in the BIOS.
EDIT: I forgot to mention, I have the hid_sensor_hub module blacklisted and not loaded.
I’m also seeing this issue, I’ve run into this twice in the last month. As above: it triggers randomly, it acts as if the FN key simply does nothing at all for fn lock/F# keys/keyboard backlight/arrow keys. In every case, it’s as if it’s not pressed (e.g. showkey -k prints the same keycode with or without FN). Resolves itself after a reboot or suspend-resume cycle.
Using Ubuntu 22.10, gen12 framework, kernel 5.19.0-29-generic with module_blacklist=hid_sensor_hub set.
I hit this again just now. Still can’t determine anything causal, though it seems to maybe happen more when the laptop is disconnected from power?
I confirmed with evtest that the Linux kernel is receiving the non-Fn keystrokes from the keyboard. This includes receiving events for key-up/key-down of the Space key when Fn-Space is pressed, and the keyboard backlight doesn’t change. Whereas when the system is fine, pressing Fn-Space doesn’t generate any key-up/down events at the Linux evdev layer.
Does anyone on this thread use ectool? I have it installed and polling fan speed every ~2 secs, and I’m wondering if it might be the cause.
I just had this happen again, and had definitely checked after resuming from suspend (and everything was fine), and now after leaving my laptop for ~30 mins (on battery power, without it suspending) on returning it’s not working again.
So this rules out the hypothesis where it’s caused by something going wonky during resume from suspend. I also noticed this in my dmesg output:
cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum 01
cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum e7
cros_ec_lpcs cros_ec_lpcs.0: packet too long (29810 bytes, expected 8)
ectool[1493750]: segfault at 0 ip 0000000000000000 sp 00007fffec92c4c8 error 14 in ectool[5604a8ae3000+5000]
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum 95
Lockdown: ectool: raw io port access is restricted; see man kernel_lockdown.7
cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum e7
cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum e7
cros_ec_lpcs cros_ec_lpcs.0: packet too long (264 bytes, expected 8)
cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum e7
The checksum and “packet too long” errors are common, and I see them at other times also, without noticing this problem. The segfault is less common; I’ve seen it before, but not often, and haven’t tried to correlate it with this keyboard problem. There weren’t any such messages yesterday around the time that I encountered the problem.
In any case, some sort of interaction with ectool seems to me like the most likely candidate, currently. I might try to increase the polling rate on my system, to see if the problem occurs more often…
I had this happen again today. I think it was while I was plugged into power, but I’m not 100% sure, mostly because I don’t regularly check if the keys are working - I just go to use one, and notice that it doesn’t work.
But the interesting thing is that it fixed itself unexpectedly without requiring a suspend-resume cycle!
My normal setup is using the laptop at my desk, with a USB-C dock providing USB PD + DP-AltMode (first external monitor) + a few USB-A peripherals, and the Framework HDMI adapter (second external monitor). Sometimes I need to leave my desk (eg. move to a meeting room), in which case I unplug both plugs, and disable the external monitors in software (with xrandr, I still use Xorg, and not doing this causes the DP/HDMI screens to not re-attach properly on reconnect), but usually not sleep the laptop.
Today the keys were working fine at the start of the day, and I had one session away from my desk. At some point after returning to my desk and plugging back in, I noticed that the keys weren’t working again. I put up with it for a while, re-checking it periodically. Eventually I gave up and decided to do a suspend-resume cycle to fix it. In preparation for this, I unplugged the 2 cables and disabled the external screens in software. The surprising part was when I then checked the keys again, they were completely back to normal!
either via USB-C or the Framework HDMI adapter (which I believe both use DP-Altmode in much the same way)
Over the coming days I will be keeping a much closer eye on the status of the keys before/after each of these events, to try to narrow things down further.
I have this happen too, but it seems to happen after a suspend/resume. I don’t recall it ever happening spontaneously but I’ll keep an eye out for that. I have an 11th gen on NixOS with kernel 6.2.2. One thing I’m testing is whether it is related to deep sleep vs s2idle.
The first case is interesting because it showed that if the problem is related to plugging/unplugging usb-c/hdmi, it can take more than a few seconds to manifest. The timeline was something like:
Resume from suspend, keys were fine.
Plug into USB-C and HDMI. Keys remained fine through the stages of doing this.
Some time later (1-2 hrs?), unplugged HDMI and USB-C. Keys were fine when I checked a few seconds after each cable was unplugged and after the external displays were disabled in xrandr.
~5 mins later, the keys were not working.
The second case was interesting because it completely rules out plugging/unplugging USB-C/HDMI. The timeline was something like:
Resume from suspend, keys were fine.
Don’t plug in anything.
~ 20 mins after resuming, keys were not working anymore. I was checking every few-5 mins, and the keys had been fine, until they weren’t.
During this time I didn’t enable/disable any hardware, plug/unplug anything, or move the laptop. It was just typing and touchpad with the laptop on a desk, connected to wifi. (The wifi is flakey, and has a habit of roaming between APs on the same ESSID every few mins. But surely this shouldn’t affect the EC? Also the earlier case above was on a different wifi network with a single BSSID.
So, based on this I agree that the cause is more likely somehow related to resume from suspend (but with some sort of delay), or some completely different and unrelated trigger.
I’ve had a look at the EC source code, in particular for why the arrow keys behave as if Fn isn’t pressed, but the F1-F12 keys behave as if Fn has been pressed.
The relevant function seems to be keyboard_scancode_callback in keyboard_customization.c. In particular, it calls:
hotkey_F1_F12 which is responsible for converting F1-F12 scancodes into media key scancodes when Fn isn’t pressed (unless Fnlock is enabled).
If keyboard_scancode_callback doesn’t make it as far as these three functions (hotkey_F1_F12, hotkey_special_key, functional_hotkey), then that would be consistent with the observed behaviour. Above the calls to these three functions are a few places where the function early-returns.
Another is if the device is in “factory state”, which I assume is either the shipping mode or a factory test mode. factory_status() just returns the value of factory_enable, which is static and only set in factory_setting(). That function is only called by factory_mode in host_command_customization.c. If this EC host command (0x3E02) is called with a non-zero arg (regardless of if the arg is RESET_FOR_SHIP or not), then that could explain these symptoms.
Another is checking for “preOS” state. pos_get_state just returns bit 0 of *host_get_customer_memmap(0x00). This appears to be the first byte of a (shared?) comms buffer between the EC and host. The only place that bit 0 gets set is in the “custom hello” ec host command, where it gets cleared. Since the scancode function is testing !pos_get_state(), if the bit is cleared, then the keyboard routine will not do Fn key handling.
So, based on this, my suspicions are returning to ectool shenanigans, and/or the dmesg errors I’m seeing from the cros_ec_lpcs kernel module…
FWIW, the memory region in question (ER1) is also manipulated directly by ACPI on the AP side.
Excellent investigation! I also believe the pos bit to be implicated in this issue.
There is a small window during which in-flight EC exchanges can get corrupted; however, Custom Hello is sent before any code outside the firmware runs. I do not believe the same to be true of ACPI writes to this part of ER1.
I think the flow is that the firmware sends Custom Hello, and then the OS eventually triggers an ACPI method that sets that bit. That should happen early on in ACPI init, though, before any userland EC exchanges.
(Edited for further speculation, and to flip the sense on whether or not I believe ACPI writes to ER1 happen before the firmware exits.)
For me it never happened after suspended/resume since I have this features fully disabled.
Sometimes it happens during typing/working, suddenly Fn doesn’t work anymore. And after a random amount of time, it works again. I only little investigated dmesg and systemd logs, but couldn’t see any suspicious log message.
Interesting, so this could (at least theoretically) be caused by stray ACPI-related writes?
This is kindof what I’m thinking. I’m regularly running ectool pwmgetfanrpm, which runs EC_CMD_GET_FEATURES (0x000D), and often see dmesg errors complaining about ec command checksums and unexpected length mismatches (presumably of the responses). So I think this corruption is actually happening reasonably often. If the command request id gets overwritten to be 0x3E02 or 0x3E07, then that could explain the symptoms.
Another possibility could be a buffer overflow accidentally overwriting the stuff in this region. Do you know what normally sits immediately below it?
I don’t understand this, I thought these EC commands like EC_CMD_CUSTOM_HELLO were commands that could be sent to the EC (and run on the EC) by the host?
Do you know if there’s any way to inspect the values in these regions from inside Linux? It would be very useful to be able to check these bits the next time the issue happens.
Similarly, do you know if there might be any workarounds or mitigations? eg. any way to forcibly set the values from inside Linux? Or otherwise perhaps “reset the EC on the fly”? (Or would that be a bad idea…? It seems like a bad idea )
After not happening much last week, this week it is back to happening several times a day. I’m starting to wonder if it might somehow be environmental?! It seems more likely to occur when running on batteries, and/or when in a noisier RF environment, and/or when moving the laptop from one place to another (without closing it, eg. between rooms). It’s also still much more likely to spontaneously go into the bad mode (no Fn key), than back to working fine (though that has still happened a few times). I really wish there was a way to programmatically introspect the state of the EC.
I’m pretty confident everything was working fine at my desk (plugged into HDMI/power/USB keyboard) on the weekend. I closed it there to suspend it, moved it, and opened it now on battery power with nothing attached and the FN key has failed again.
I’m not sure if there’s anything super relevant, but the dmesg output covering the suspend/wake up is here:
We will take a look at this issue and see what is going on.
Keyboard events go through a separate interface to the ACPI interface.
However the EC will disable all function keys if the OS is not in ACPI mode so if all function keys stop working, including the top row, this may be the cause.
The EC needs a host command from the bios on boot to enable function keys.
This is done in the host_command_customization.c
sci_enable(void)
The keyboard events are going normally through the 8042 interface, it’s just that the scancodes (and behaviour) are wrong.
I’m pretty sure ACPI is enabled, since this happens in the middle of using an otherwise completely normally functioning laptop (including features that I believe require ACPI, like suspend-resume). And so “Fn key stops working” does literally mean “while using the laptop”, and not “after some boots/resumes”.
Is there a way to check/confirm that Linux is still in ACPI mode?
I’ve looked at sci_enable, the problem appears to be that bit 0 of byte 0 of the “customer_memmap” region. The call to pos_get_state() in keyboard_scancode_callback() means that, when the Fn key is working, we can infer that the bit must be set. Which I think suggests that ACPI has been enabled and probably sci_enable has properly run (gone through its main path)?
But if that bit has somehow been cleared, then that could explain the symptoms. But also if factory_enable has somehow been set, then that could also explain it. My outsiders perspective detailed analysis is back on this earlier comment, in case it helps you get started…
Previously, I tried increasing the frequency of how often ectool pwmgetfanrpm is run, to see if it affected the issue (it didn’t, or not noticably). Today I tried the opposite test - competely disabling those calls. I spent 90 minutes in a physical environment / situation where the problem has been readily reproducing for me, and everything was fine. I then re-enabled the ectool pwmgetfanrpm calls (which run every ~0.5 secs), without changing anything else (including the environment / situation), and within 10 minutes the problem had re-occurred and the Fn key was no longer working.
Based on this I’m now pretty convinced that, at least in my case, the ectool usage is causally related. At least one commenter previously said that they weren’t using ectool, but perhaps some other EC comms (eg. by one of the cros_ec* kernel modules?) can cause the problem (perhaps with a lower frequency / incidence).
I think this also correlates with all the dmesg errors I’m sometimes seeing about EC message checksum/packet length errors and ectool segfaults (though I don’t understand why I only sometimes get dmesg complaints about raw port io being locked down…). I’m pretty much back to my original hypotheses: an ectool command request getting mangled such that the EC ends up running 0x3E02 (EC_CMD_FACTORY_MODE) with non-zero arg, or 0x3E07 (EC_CMD_CUSTOM_HELLO), or, something else is causing *host_get_customer_memmap(0x00) & BIT(0) to be cleared or factory_enable to be set (eg. buffer overrun somewhere in the EC?).
I’m using the ectool from DHowett/framework-ec@d5b5b50. Is there a better/official version that I should use instead? For now I’m leaving the ectool fan reporting disabled, and will see if the problem still happens but less frequently…