Title says it all - In the last 3 days I received my FW16, spec’d to the 9’s because I thought I’d treat myself. Sadly, despite my troubleshooting efforts, I’ve experienced an extremely frustrating issue that I believe to be hardware related, however I’d like to see if the community can think of anything that I haven’t yet. I’ve also submitted a support ticket, however I would imagine that due to the holiday season, support may be understandably backlogged.
Essentially what happens is that every 10-30 seconds, my input deck appears to just die. It’ll power on, keyboard/touchpad/numpad all appear to be working, then it’ll fail, reinitialize, then the keyboard/touchpad/numpad are available for a brief time. It is doing this ad infinitum. Because NixOS is technically not a supported OS outside of community support, the below troubleshooting has been done exclusively in W11 24H2.
Troubleshooting done already:
Check all pogo pin connections across all input devices as well as the mid plate for any discernable damage - none found
Remove/reconfigure into every possible orientation for all input devices
Test from within the BIOS to rule out potentially bad drivers within OS - Error is persistent
Remove/re-add midplate multiple times ensuring that the ribbon cable that connects to the mobo is seated correctly and the plastic guides are lined up correctly
Visually inspect midplate ribbon cable and header - No signs of damage/wear (it’s a brand new system so this would be unlikely imo)
Remove/re-attach battery
Boot with battery disconnected to rule out bad battery connection
Disable all non-input deck related I/O in BIOS to rule out a hardware fault elsewhere
Remove my Linux drive entirely to rule out issues dealing with the dual drive setup
Move my W11 install to the primary drive to rule out issues booting only from a 2230 instead of the primary 2280 slot
Test with external mouse and keyboard via USB-C dock - These don’t seem to disconnect when this occurs leading me to believe it is strictly something to do with the midplate or possibly the motherboard
If you have any suggestions at all that I haven’t tried, please let me know. I’m fairly convinced that it’s a hardware failure because the issue is present when in the BIOS, however I’d love to be wrong and be able to use my new computer for more than 15 second intervals
System Specs:
FW16 DIY Edition
CPU: Ryzen 7940HS
Expansion: dGPU Expansion (AMD Radeon 7700S)
RAM: Crucial 32 GB 5600MHZ Kit (2x16) - Part number CT2K16G56C46S5
Storage: Primary - M.2 2280 WD Black 2TB SN850X, Secondary - M.2 2230 WD Black 1TB SN770M
Input Modules: Keyboard, Numpad, Trackpad, 2x Spacers
OS:
Primary Drive - Nixos 24.11 with Linux Kernel 6.12.7 (though I’ve tried to make this work with many different kernel versions
Secondary Drive - W11 24H2
BIOS: 3.05 (current as far as I can tell by documenation)
With the amount of testing you’ve done, it hard to think of what else to check.
Does a device show up in dmesg as connecting and disconnecting repeatedly? The mid-plate has an usb hub chip, and there is a module presence pin for each location. A malfunction in either could cause or trigger disconnection.
The keyboard, numpad, touchpad always disconnect and reconnect together as a group?
Does your fingerprint reader stay connected during this?
~edit~
As I was writing, I realized one thing to check, which I don’t see you mention in your list. Did you try disabling the module presence detection? There is an option for it in the BIOS iirc. I believe there is a chip in the mid-plate that all the module presence pins feed into. It could be malfunctioning.
Are the disconnects pretty regular? You say every 10-30 seconds. No disconnects 0-10 seconds after the input modules come back to life?
If it was truly random, then an individual connection getting broken could be a potential. Any single module presence pin breaking contact should take out everything, all upper modules + touchpad (as a short prevention safety function).
I didn’t bother to check this, but it’s a great suggestion to do so. I didn’t have Fedora/Ubuntu ready to go and I wasn’t sure if I’d receive any support if I was trying to troubleshoot with a non-supported OS so I didn’t even think to check dmesg logs. That being said, I did monitor it in Device Manager in W11 and see the continual disconnects/reconnects as it was happening.
Upon checking dmesg on my NixOS drive, yes, I see the continual disconnect/reconnects in the dmesg logs as well. Seems like it’s detecting it as a new device continually
Yeah, the whole deck dies altogether, fingerprint reader included. I actually had to disable the fingerprint reader specifically in the BIOS because it was causing the system not to be able to reboot while it was re-initializing when I would attempt to restart the machine. Keyboard/numpad/trackpad don’t seem to cause the system to hang when rebooting for whatever reason.
Seems sort of random within that 0-30 second window after the issue starts. I have noticed that if the machine has been off for any length of time, it’ll take a while (5-10 minutes or so) for this error to crop up. Thinking about it now, could potentially be more mainboard related if it’s somehow related to once the machine reaches a certain thermal threshold. That being said, the fans appear to be working and I’ve not seen the machine reach anything outside of spec temps so I didn’t think about thermals much more beyond that.
I just tried this and it’s been about 30 minutes since the last disconnection. I haven’t used the machine all day due to work needing to get done, so we’ll see how real this is as a solution. But that would point pretty firmly at the chip you mentioned malfunctioning
Thank you for all of your help by the way. Very much appreciate the willingness to provide some suggestions
EDIT:
After further troubleshooting, I can confirm it’s definitely something to do either with the midplate entirely or the module presence detection chip. Switching the BIOS option to “Force On” for this, fixes things. Switching it to “Require Modules” and the lightshow (due to backlights of the modules) begins again.
Here’s hoping support sees this exhaustive troubleshooting and sends me a new midplate
The randomness I think makes it harder to have to pin down.
Regarding the lack of it happening before the laptop warms up, well temperature can just effect so much.
The fingerprint reader doesn’t actually run through the mid-plate, but it could still be effected by a problem coming from the input deck. There is an usb hub that is shared between both. It’s located on the mainboard, but something on the input deck could be causing it to reset.
At least the BIOS setting has helped it work better while you wait for support.
From what I’ve read of other people’s experiences, I feel like support may not be able to skip the troubleshooting which is outlined for them. Unfortunately.
Seems like BIOS or EC is going crazy because it intermittently sees input modules as missing.
Have you tried setting “Force Power for Input Modules” in Advanced BIOS settings to always enabled? If this would solve the issue, then it would definitely be this detection that causing the problem.
Yes, after my initial post, MJ1 had suggested doing so and that was indeed what was happening. I can fully mitigate the issue by switching this setting on. When I turn it to “require modules”, it resumes the freak out.
After how many days is reasonable to wait to hear a response back from Support beyond the automated “We’ve received your ticket” email from the initial ticket submission? Been 2 days without any human response yet for the issue detailed above…
With the holidays, people buying more and also support being closed for 2 days for Christmas and 2 days for New Year holidays, they might be working through a bit of a backlog at the moment.
Regular support has resumed, but linux support hasn’t yet.
To give an idea, just of possible delay times, back in June when several new products were announced they said response times had fallen to 3-5 business days. Hoping it would return to normal soon.
But I haven’t seen anyone mention recently what the current response time is. They have said that they aim for 1 business day iirc, but it will vary with current load or holiday backlog. I recall some time ago it being said that they don’t or didn’t want to use temporary staff which they know they can’t keep on.