Linux becomes unresponsive randomly (various distros)

So interesting problem I’ve not had until now. Basic setup is the mid-tier i7 running Windows and Fedora on the SSD, along with a “side” distro on the 250GB expansion card.

The issue seems to have become any distro on the expansion card, which randomly freezes after an indeterminate amount of time. After using the laptop for hours, I may get the following problem:


This is the first time I’ve noticed one icon (WhatsApp) not greying out during this issue.

Nonetheless, I can click icons and get their associated dropdowns (wifi, right-click menus) and I can do a ctrl-alt-del and get the reboot menu, but no actions are performed if I select something.

So basically I can interact, but nothing will run when this happens. The only fix is a hard-shutdown and boot.

I’ve installed at least half a dozen distros, trying them out, on the expansion card. Last few that this issue happened on were Ubuntu 21.10 (current), Elementary OS, and Manjaro. I don’t recall the issue before then.

My gut says overheating of the expansion card, as my SSD performance is flawless. Otherwise, no ideas.

Ideas?

Are you using battery saving measures on the expansion card’s OS?

I’ve heard of e.g. TLP being overly eager to save power and turning off the USB controller.

Nope, all the defaults in the GUI (balanced power saving) and not using any additional command-line utilities.

Okay, I’m just going to pull something out of my ass because it’s what I’d try next:

Keep a terminal open at /dev/ on a TTY and ls until you get the freeze. The fact that the GUI still kinda works indicates that the kernel hasn’t panicked and that you can interact with it indicates that there’re still some goodies working in the background. Very likely a terminal that’s been already loaded into memory will still be in memory.

If ls fails when you get the freeze with some kind of ‘command not found’ or something, then try loading up a tmpfs, dump busybox in it, and try again but then use the busybox from tmpfs. If the busybox ls works, then look around and see if your root partition’s device is still in the /dev/ list. If the busybox doesn’t, then tmpfs got dropped which means that something in the kernel got screwed up (maybe? I think?) and… uh… well I’m out of ideas from there (for now).

EDIT: If you don’t see your root device, then there’s your problem. If you do see it, then there’s a different problem and the same “well I’m out of ideas from there (for now)” statement above applies.

EDIT: AND DISABLE SWAP. I think. Wouldn’t hurt, probably will help.

Great suggestions, I’ll give them a shot tomorrow. I’m back in Windows now; as I was using GIMP to edit a picture, its functionality stopped working (in this case, deleting a selected area of the picture). As with the rest of the UI, I could click buttons and dropdowns, but to no effect.

Moved over to the browser window (as I’d had the notification of your post while I was editing, before it barfed) and “no connection.”

Hard shutdown and boot into Windows to at least check in before I give in, for the evening.

Thanks again, will report back as I wait for another fail tomorrow.

Wait wait

As with the rest of the UI, I could click buttons and dropdowns, but to no effect.

That from within a single already-loaded-and-running program gimp? That’s… a possible spanner in my ideas.

Are you running Wayland or Xorg?

And: when you say ‘no effect’ do you mean ‘no dropdown menus appeared’ or ‘the menus showed up but the menu items did nothing’? I ask that because dropdown menus are sometimes implemented as new windows, and IIRC at least one of Wayland/Xorg ecosystems has a subsystem which uses a file (which could very well be on the root device) to store the address of the display server.

Yeah, I kinda thought that’d be a wrench in the works, although possibly enlightening.

Dropdowns did indeed appear, but clicking options did nothing (click effect displayed, no action).

Wayland.

I’ll see if I can make it break again and report back. Thanks!