Linux + Nvidia performance, next steps to try?

So I’ve been experimenting with my Framework + Razer Core X eGPU + Nvidia GTX 1050 Ti 4GB. Every combination of OS, drivers, window system, etc has some kind of annoying aspect to it. I have the Framework’s internal display and also my HDMI 1080p monitor, which is connected directly to the Nvidia card.

Ideal: “just works without a crash” whether eGPU is present at boot, added at greeter, added after login, hot unplugged, etc; good frame rate without excessive CPU use (with or without eGPU); Vulkan support…

Here’s what I actually have found out through a long process of trial and error:

  • In EndeavourOS (5.15.6 and .7) and Plasma Desktop (my daily driver), using Nvidia_dkms drivers, early KMS (mkconfig modules), and nvidia-drm.modeset=1 cmdline option (all from the arch Linux wiki)

– With X11:
— Pros: Nvidia renders everything, 30-60 fps in desktop, no bogging down on mouse moves or app start, some things use Vulkan (but some fail down to OpenGL)
— Cons: X11 crashes to SDDM greeter on eGPU hot (un)plug, fails to wake from sleep or hibernate if eGPU status changes, etc; 30-40% of one CPU core is constantly in use by Xorg main thread (3-5% when eGPU not present though, but still noticeably higher idle power use than Wayland).

– with Wayland:
— Pros: hotplug / unplug / resume in a different state than it went to sleep in just works most of the time; Intel renders the desktop but I can force things onto the eGPU using prime-run (though some things render as black screen only), < 1% idle CPU use with or without eGPU present, etc.
— Cons: 9-16 fps in idle desktop with GPU connected, regardless of whether it is rendering anything at all, and with all desktop effects turned off. Stutters and old frames brought up when web browsing, playing games, etc. Audio skips and pops in music streaming, games, etc. Basically unusable for anything.

  • In Fedora 35 (kernel 5.15.6), nouveau drivers, Plasma Desktop (aka stock F35 KDE respin)

– With X11:
— Good idle desktop fps, 30-60 range, but:
— fails to boot if eGPU connected at power up, random kernel panic level failures out of graphical session, requiring long power press to reboot (therefore unusably unreliable)

– With Wayland:
— Good desktop fps when truly idle, but can get bogged down (by something as simple as moving the mouse) to single digit fps with only the desktop running. Hit or miss what programs will also bog fps down.

I would love to get EndeavourOS + Wayland up to a functional fps for desktop tasks, gaming, etc. If anyone has any good ideas of what to try next to get that working better, please let me know! Been through all the wikis, forums, etc that my duck-duck-fu could dig up, and nothing has really improved that situation. Consensus seems to be that Wayland and Nvidia are just oil and water, despite recent advances in 495.44 driver …!!!

I could live with my current solution of logging into Endeavour Wayland when undocked, logging out to SDDM during (un)docking, and Endeavour X11 when docked if need be. But it’s worth a try to get one setup working for all use cases instead. So that means either getting hot (un)plug and lower idle CPU use on X11, or better fps on Wayland.

Looking forward to seeing what the brain trust here knows that I don’t…

Have you tried this?

1 Like

@Jean-Marc_Le_Roux , yes I’ve experimented with the timing of when eGPU should be connected during boot/login process, with every combination of software I’ve detailed above. I’m also on BIOS 3.06 from the factory on my batch 5, so no issues there.

I really don’t think it’s a hardware issue with the Framework, cable, eGPU housing nor the Nvidia card. Like I said I get a pretty reasonable performance from nvidia_dkms and X11. I don’t think it is misconfiguration of any of the things I have installed, but I’m open to suggestions that I may not have run across in my research. If anything, I think it’s just problems with the various drivers + window systems themselves not being well optimized yet. However, I’m open to suggestions of other combinations of software to try, if someone else has their setup working better in any/all of the ways I listed as important to me under my “ideal case” above.

Thanks

I know this may sound blasphemous but how does GNOME + Wayland run on your setup?

Nope, not blasphemous at all. One thing I really like about the Linux ecosystem is that if one project isn’t cutting the mustard, you have the opportunity to replace it, fork it, fix it, etc!

Did a very minimal install of gnome-shell, gnome-session, and their marked necessary dependencies only in EndeavourOS. Might have possibly missed a gnome to wayland helper package, but…

Gnome on Wayland recognized that there was a second display attached, but no matter what I did there was no way to actually bring up signal to the external monitor that I could find. Power cycling the monitor, destroying and manually re-specifying the output in xrandr, etc had no effect.

Gnome in X11 works about the same way as KDE on X11, decent performance but high idle CPU use, inconsistent Vulkan availability, etc.

Well the plot sickens. Now seeing high residual CPU use in Wayland (kwin_wayland process), and way worse desktop framerate (now ~20fps roughly), too with the eGPU plugged in. Even worse than X11 in fact. Weird Weird Weird.

Kernel 5.15.10, up to date wayland packages in arch, etc…

Gonna necro because I’m tearing my hair out over Fedora GNOME + Xorg suspend issues with my NVIDIA egpu. Did you manage to find a combination that works for you?

I’m okay with no hot plug behavior, I just want to be able to suspend my laptop while it’s connected to the egpu and occasionally shutdown to disconnect if I want to use as a laptop.

I really don’t suspend when it’s at my desk and plugged in, I just let it idle with screens off.

@Be_Far , suspending while eGPU is connected is working for me, albeit a bit more slowly to actually get to suspend.

My setup:

  • EndeavourOS (arch based)
  • kernel 5.18.1-arch1-1
  • nvidia-dkms 5.15.48.something
  • Sonnet 550 eGPU enclosure
  • nVidia 1050Ti card
  • X11, not Wayland
  • KDE Plasma

Slow I can handle. Thank you so much, I guess I’m nuking Fedora and trying my hand with an Arch distro!

Give Endeavour a try. I’ve been very satisfied with it for the past 5 months or so. Very few wrinkles in the rug getting to initial setup, very few issues on the Framework, and those few issues are usually fairly easy to solve or mitigate.

No on egu-switcher. I did try it, and used the examples it generated to inform a single Xorg conf that adapts to both cases (though not without a reboot, kernel and/or Nvidia drivers panic if the eGPU goes away).

I’m going to try it first exactly with your setup, and if that works, then I’ll try it on the DE I fell in love with (GNOME).

Do you by any chance use egpu-switcher?

Would you be willing to share that configuration with me? Thanks! That way I can modify for my specific hardware/addresses and see if I can get my desired behavior without switcher. Less software involved is better for troubleshooting. Endeavour installing as we speak…

Edit: X11 reloaded on hotplug, I haven’t installed the Nvidia drivers yet so I’m assuming that’s a Noveau behavior. For some reason I can’t set my internal screen to be a smaller resolution than 2256x1504, and the info box says that’s not possible on Wayland but is on X11 (1000% sure I’m on X11). Weird. We’ll see how the Nvidia install goes.

Edit 2: Froze on the restart after installing Nvidia, I forced shutdown, and now the external display isn’t detected and reverting to the noveau driver doesn’t help, so I’m reinstalling Endeavour to try again. Maybe I bricked something?

Edit 3: reinstalled endeavour, installed drivers, waited for it to restart (took a really long time). My external monitor has a few lines from boot on it but isn’t detected as a display. Irritating. all nvidia- commands hang, the card is detected in lspci. nvidia-xconfig can’t find the xorg configuration. I tried making my own xorg.conf, but that ended in hangs on shutdown and startup.

I’m at a loss as to what’s causing this. Everything works perfectly on Windows 10, and at this point I’m considering just reclaiming those 200gb on my SSD I stole from Windows’s install entirely.

EDIT 4: EXTERNAL DISPLAY IS DETECTED! here were my exact steps:
Install Endeavour
Install nvidia-installer-dkms and use it to install the drivers
Nothing works, try making xorg conf
Hangs on boot, go into tty and delete xorg conf, revert to noveau from nvidia-installer-dkms, restart
Install+configure egpu-switcher
Install nvidia-inst and install nvidia drivers, restart
While restarting, read about a bug with intel cpus on 5.18 and nvidia
quickly add ibt=off and nvidia-drm.modeset=1 (for good measure) to the args from grub

lspci -k says I am indeed on the nvidia driver. I’m very happy!

Edit 5: I am unable to get the laptop to sleep. nvidia-suspend and nvidia-resume are enabled, and /usr/lib/modprobe.d/nvidia-power-management.conf contains the line NVreg_PreserveVideoMemoryAllocations=1. It just immediately turns back on. sigh…

Edit 6: I found out I was doing modprobe configuration lines wrong, and fixed it to read options nvidia NVreg_PreserveVideoMemoryAllocations=1. Updated initramfs and restarted, no change. My logs are proving unhelpful:

Jun 08 09:00:16 bpc-endeavour systemd[1]: Starting NVIDIA system suspend actions...
Jun 08 09:00:16 bpc-endeavour suspend[6713]: nvidia-suspend.service
Jun 08 09:00:16 bpc-endeavour logger[6713]: <13>Jun  8 09:00:16 suspend: nvidia-suspend.service
Jun 08 09:00:18 bpc-endeavour systemd[1]: nvidia-suspend.service: Deactivated successfully.
Jun 08 09:00:18 bpc-endeavour systemd[1]: Finished NVIDIA system suspend actions.
Jun 08 09:00:18 bpc-endeavour audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nvidia-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:18 bpc-endeavour audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nvidia-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:18 bpc-endeavour systemd[1]: nvidia-suspend.service: Consumed 1.288s CPU time.
Jun 08 09:00:18 bpc-endeavour kernel: audit: type=1130 audit(1654696818.003:146): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nvidia-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:18 bpc-endeavour kernel: audit: type=1131 audit(1654696818.003:147): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=nvidia-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:18 bpc-endeavour systemd-sleep[6718]: Entering sleep state 'suspend'...
Jun 08 09:00:18 bpc-endeavour kernel: PM: suspend entry (s2idle)
Jun 08 09:00:20 bpc-endeavour kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Jun 08 09:00:20 bpc-endeavour kernel: PM: suspend exit
Jun 08 09:00:22 bpc-endeavour systemd[1]: systemd-suspend.service: Deactivated successfully.
Jun 08 09:00:22 bpc-endeavour audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:22 bpc-endeavour audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:22 bpc-endeavour systemd[1]: systemd-suspend.service: Consumed 1.847s CPU time.
Jun 08 09:00:22 bpc-endeavour suspend[6763]: nvidia-resume.service
Jun 08 09:00:22 bpc-endeavour logger[6763]: <13>Jun  8 09:00:22 suspend: nvidia-resume.service
Jun 08 09:00:22 bpc-endeavour kernel: audit: type=1130 audit(1654696822.638:148): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jun 08 09:00:22 bpc-endeavour kernel: audit: type=1131 audit(1654696822.638:149): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Edit 7: absolutely guttered. I swapped to deep sleep and guess what shows up? The exact same behavior I get from fedora. 24 hours of my life and no change. Done with Linux on laptops, nuking it for another 200gb of windows space. Thanks for all the help anyway.