[RESPONDED] Booting Debian

I got my FW16 a few days back, now I’m trying to get Debian to boot on it - simply Debian just works for me, I can’t stand Ubuntu and don’t really mind other distros. Eventually I want to run Debian Testing on it. Unfortunately the Testing live ISOs are lacking the installer (the one you can launch after booting into the live environment, not the Win95-looking one that you can alternatively boot into) and since I want to use an encrypted btrfs partition as / and want to use a swapfile instead of a swap partition - fingers crossed that I can set-up hibernation to a swap file in my environment - I opted to install from the Debian 12.5 Gnome live image. The live environment works without issues - though you shouldn’t try to browse the web, that will crash the environment and trigger a reboot.

I wasn’t surprised at all that Debian Stable wouldn’t boot after installation though, as AMD recommends at least Linux 6.4, also it complained that the firmware “amdgpu/gc_11_0_1_me” couldn’t be loaded. I chroot’ed into the install and installed linux-image-amd64 and firmware-amd-graphics from the testing repo. The complaint about the firmware didn’t disappear, so I learned that while testing and sid ship the firmware from June 2023, Linux needs at least 09/2023 to handle RDNA3. So I opted to get the 04/2024 firmware from here and copy it into m installation (into both /lib/firmware/amdgpu and /usr/lib/firmware/amdgpu, as the guide for the FW 13 that I found on Reddit suggested the former, while the firmware-amd-graphics package uses the latter and there don’t seem to be symlinks between them).

Now the only error message left that was displayed upon boot and beyond that the system wouldn’t progress was

cros_ec_lpcs cros_ec_lpcs.0: EC ID not detected

Everywhere the recommendation was to just blacklist the cros_ec_lpcs kernel module. First I tried using this guide from Archwiki, but since the error message didn’t disappear I added the neccessary command according to this to /etc/default/grub, ran update-grub and update-initramfs -ck all.

Now I’m not getting any errors anymore, but the system just stops progressing the boot process at a black screen with a blinking white underscore-cursor.

Does anybody know how I can progress to boot into the system, at least working well enough to verify the system can boot, connect to WiFi and update all packages to testing?

I also fixed the missing firmware error by downloading the missing file from git.kernel.org.

As to “cros_ec_lpcs cros_ec_lpcs.0: EC ID not detected”: not every error issued by the kernel deserves attention. I run Debian Sid (installed via debian-testing-amd64-netinst.iso ) and it works fine-ish without trying to get rid of that message.

Once I’m actually able to boot into my system, I can disable blacklisting for it. Right now, as long as the possibility exists that it might prevent me from booting, I will keep it disabled.

I just updated libgl1-mesa-dri, libglx-mesa0, mesa-vulkan-drivers and xserver-xorg-video-all to Debian Testing, yet no success. Strange thing is, that the Live environment loads without any (major) issues, but gdm won’t load in the installed environment. Also, trying to switch tty also doesn’t change anything, so the bootup probably stops before that.

EDIT: so I just got the idea to select the rescue mode in Grub. I get the message “Cannot open access to console, the root account is locked.” Now how do I fix this? This usually happens when people mess up their fstab. But I never touched it and both it and crypttab look correct. Also there was the recommendation to create the root account (sudo passwd) which I would have done either way, but it won’t let me from chroot, complaining not having a console.

Hm, that all seems odd.

I don’t have my FW16 yet, couple more weeks I think (Batch 12). I had a bit of trouble figuring out your timeline, did you get 12.5 booted (even with firmware issues)? You can upgrade to Testing from there you know.

See How to upgrade to Debian (next-stable) Testing heading from DebianTesting - Debian Wiki.

Not really. The live ISO does boot. But after installation, it only booted until a bunch of error messages (the cros_ec_lpcs error, the firmware error and because of improper kernel upgrade it only showed a BusyBox terminal). I was able to fix the issues by chrooting into the system, but I don’t really want to upgrade fully to testing from there. I know how to complete a Kernel update from such an environment, but only because I know that refreshing of initramfs is skipped when upgrading inside chroot and how to refresh it without breaking the bootup in different ways. But now I’m stuck with the blinking curser. When selecting advanced settings in grub and select Kernel 6.6.15 with rescue mode so show logs, it tells me that the root account is locked and thus can’t launch a console.

Huh. Well, I’ll be getting mine in a few weeks, so I’ll see what my experience is like.

It looks like I can’t use preformatted text, otherwise I get a 403 error…

Ok, I’ve made some progress. I was able to set the boot password, after adding this little script to my chroot setup workflow:

# for i in /dev /dev/pts /proc /sys /sys/firmware/efi/efivars /run; do sudo mount -B $i /mnt$i; done

this can be found here: GrubEFIReinstall - Debian Wiki
Setting the boot password revealed this new error message: ucsi_acpi USBC000:00: GET_CURRENT_CAM command failed

I also booted into the Linux 6.6.15 rescue shell, and after adding init=/bin/sh to the cmd options of /etc/default/grub (and update-grub afterwards) I was able to log-in as root and run journalctl -xb. The result can be found here: https://pastebin.ai/wbui9eqimm

If the crypttab and fstab are required for further investigation because the Debian Installer may have a bug, here they go:
Crypttab: https://pastebin.ai/lauuazwrqr

Fstab: https://pastebin.ai/xkbdqrouzr

Also I noticed something strange after running update-grub: https://pastebin.ai/4lujih28q9
maybe /dev/sda1 is just the USB I boot from, but other than that, the only storage devices should be NVME SSDs

EDIT: and dmesg: https://pastebin.ai/6wk9wp0vkn

In your attempt to hide an easily-ignorable warning message, you have blocked yourself from having access to charge control, LED control and fan speed reporting in the future. It’s fine to leave that message be. :slight_smile:

1 Like

Kind of irrelevant right now if I’ve got bigger issues though…

But at least somebody explaining what it even does. Even here in the community it’s made to look like it’s nothing of relevance if you’re not planning to modify the controler firmware.

I think I’ve found the issue. Question is, why does it even happen in the first place?
In fstab, the root subvolume is mounted with these parameters:
/dev/mapper/luks-775ea946-6797-4c4d-a042-72924309f3d2 / btrfs subvol=/@,defaults,noatime,space_cache,autodefrag,discard,compress=lzo 0 0

but systemd-rfkill keeps komplaining, that both /var/lib and something under /lib are ro filesystems. This isn’t an immutable distro so that shouldn’t be a thing. But /proc/mounts does agree:
/dev/mapper/luks-775ea946-6797-4c4d-a042-72924309f3d2 / btrfs ro,relatime,ssd,space_cache=v2,subvolid=256,subvol=/@ 0 0

Or for a full output:

sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,nosuid,relatime,size=16031752k,nr_inodes=4007938,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=3215816k,mode=755,inode64 0 0
/dev/mapper/luks-775ea946-6797-4c4d-a042-72924309f3d2 / btrfs ro,relatime,ssd,space_cache=v2,subvolid=256,subvol=/@ 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
efivarfs /sys/firmware/efi/efivars efivarfs rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=19120 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
ramfs /run/credentials/systemd-sysctl.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
ramfs /run/credentials/systemd-tmpfiles-setup-dev.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
tmpfs /tmp tmpfs rw,noatime,inode64 0 0
/dev/mapper/luks-775ea946-6797-4c4d-a042-72924309f3d2 /home btrfs rw,noatime,ssd,space_cache=v2,subvolid=257,subvol=/@home 0 0
/dev/nvme0n1p2 /boot/efi vfat rw,noatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 0
ramfs /run/credentials/systemd-tmpfiles-setup.service ramfs ro,nosuid,nodev,noexec,relatime,mode=700 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc rw,nosuid,nodev,noexec,relatime 0 0
/dev/sda1 /media vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro 0 0

Ok, this seems to be a general btrfs error. It seems the mount option “space_cache” defaults to v1, but v2 is needed: https://github.com/btrfs/btrfs-todo/issues/29

Now with this issue solved, there isn’t really much left in the journal and in dmesg with the severity level of error, yet I can’t manage to boot to the desktop.

dmesg -HTl err:

[Apr21 15:27] cros_ec_lpcs cros_ec_lpcs.0: EC ID not detected
[  +2,702570] ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-5)
[  +5,163814] ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)

journalctl -b --priority=3:

Apr 21 15:27:21 framework16 kernel: cros_ec_lpcs cros_ec_lpcs.0: EC ID not detected
Apr 21 15:27:23 framework16 kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-5)
Apr 21 15:27:24 framework16 bluetoothd[869]: src/plugin.c:plugin_init() Failed to init vcp plugin
Apr 21 15:27:24 framework16 bluetoothd[869]: src/plugin.c:plugin_init() Failed to init mcp plugin
Apr 21 15:27:24 framework16 bluetoothd[869]: src/plugin.c:plugin_init() Failed to init bap plugin
Apr 21 15:27:24 framework16 bluetoothd[869]: profiles/sap/server.c:sap_server_register() Sap driver initialization failed.
Apr 21 15:27:24 framework16 bluetoothd[869]: sap-server: Operation not permitted (1)
Apr 21 15:27:28 framework16 kernel: ucsi_acpi USBC000:00: ucsi_handle_connector_change: GET_CONNECTOR_STATUS failed (-110)

None of them look like they should be blocking anything.

If you’re manually making nodes like that something VERY core is wrong with your install. These should be automatically created at bootup time during the initramfs.

Maybe you’ve got a busted systemd install.

1 Like

Don’t just download the missing files. You should take a whole snapshot. And I’m a broken record saying this over and over but please get Debian to fix the snapshot and update it.

They do stable release installer images and this could be fixed once and for all for everyone if they would just update their Linux firmware package.

2 Likes

What nodes? I don’t think I’m making any nodes…

I don’t see how this should be the case, systemd seems to be operating just fine. The only errors that I’m still getting is from ucsi_acpi and bluetoothd.

The only files I manually downloaded where the amdgpu firmware. And you mean I should just take the whole archive I downloaded and copy over all the firmware files or what are you trying to say?

And of course it would be ideal if at least sid didn’t only have firmware that’s close to being a year old. There already is an entry in the bug tracker from september for firmware-amd-graphics, but non-free-firmware doesn’t seem to be the priority. Also, my guess is most Debian devs have their hand full with the time_t transition right no, so no idea how many ressources they currently have available for this.

I’m meaning you should take all the amdgpu firmware not just the missing files. There are very important fixes in other files too.

To me it’s really weird they do backport kernels but not backport firmware and don’t prioritize this. I really think you guys should make more noise.

I actually did copy over the whole amdgpu directory. Anyways, after copying over all firmware files I finally was noticed about the missing piece. The reason why the desktop still wasn’t showing was because I needed to remove the init=/bin/sh part again from the grub command line in /etc/defaults/grub. I now got to the desktop and am currently running the full upgrade to testing.

1 Like

I think you’ll be pleased with the result. I got my FW16 a few days ago and it’s happily running Debian ‘testing’ (iGPU, no dGPU).

Once I’m able to set everything up, I’m sure too. A few things I’ll probably have to dig a bit though. The displays ICC file (even though its usage beyond sRGB will have to wait for future Gnome versions), a version of fwupd that’s compatible with testing (it’s missing from the repo, only stable and sid have it, neither are compatible with the rest of the packages, so I’ll need to find a way to compile it) and what not comes to mind.

I run Debian Trixie (Testing), with the main differences from you being ext4 instead of btrfs and a swap partition instead of a swapfile without any issues [1] during install or boot. fwupd is indeed in the testing repo (version 1.9.14-1) so I wonder if you’re repos are correctly set up if you’re missing it.

Word of warning that if you’re using Secure Boot, Debian’s kernels do not ship with any patches for enabling hibernation under any configuration (even an encrypted swap partition). There is a way around it, however, if you’re comfortable installing a kernel module that bypasses it talked about here.

[1] after, like you, replacing the amdgpu firmware with the at-the-time latest from the kernel git