i bought a Framework13 laptop with AMD Ryzen 7840U and installed Debian Bookworm. The out-of-the-box kernel 6.1.64 worked quite good and i recompiled the 6.1.67 vanilla kernel with all options which should be set for a Framework laptop (from different forum threads). With this kernel everything works fine (WLAN, standby, camera, etc).
Now i tried to compile a newer kernel. Up to 6.2.16 WLAN still works as expected but all newer kernels i tried (6.3.13, 6.4.16, 6.5.13, 6.6.6) have a strange connection problem.
I did a very simple test and pinged a local PC from my framework laptop and in reverse did another ping round from my local PC to my framework laptop. The ping from my laptop to my PC always works as expected with nearly equal round trip times. But the ping from the local PC to my laptop starts getting worse with kernels >= 6.3. I got lost packets and rtt of partly >3000ms.
Strange is that the problem only occurs when the ping starts from the local PC but from the laptop it always works. I tried another local PC and the result is the same.
Maybe i have to change another kernel compile option to get it working with newer kernels but at the moment i use the same base config with Framework specific settings for all kernels.
What i can see is that directly after boot and WLAN is connected the ping from my PC is quite good. But it takes only a minute or so to start getting worse.
I can’t use i.e. ssh to connect to my laptop because of the high rtt and packet loss.
At the moment i stay at kernel 6.1.67 but i read that many use newer kernels and it seems to work for them. So i wonder what i could try to find the culprit of my problems.
I did the same tests as before with pinging the laptop from my PC with kernel 6.1.67 and 6.6.6 and had wavemon and dmesg open. The following screenshots are done while the laptop was being pinged.
There seems to be some known issues with certain AP combinations with the mt76 stack. Disabling power save features seems to resolve the problem of dropouts and latency. So much so it’s included in several ‘performance’ orientated builds - here is one such patch:
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Jan200101 <sentrycraft123@gmail.com>
Date: Mon, 27 Nov 2023 15:25:48 +0100
Subject: [PATCH] mt76: mt7921: Disable powersave features by default
This brings WiFi latency down considerably and makes latency consistent by
disabling runtime PM and typical powersave features by default. The actual
power consumption difference is inconsequential on desktops and laptops,
while the performance difference is monumental. Latencies of 20+ ms are no
longer observed after this change, and the connection is much more stable.
Signed-off-by: Jan200101 <sentrycraft123@gmail.com>
---
drivers/net/wireless/mediatek/mt76/mt7921/init.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/init.c b/drivers/net/wireless/mediatek/mt76/mt7921/init.c
index ff63f37f67d9..840b4c606c83 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/init.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/init.c
@@ -220,12 +220,6 @@ int mt7921_register_device(struct mt792x_dev *dev)
dev->pm.idle_timeout = MT792x_PM_TIMEOUT;
dev->pm.stats.last_wake_event = jiffies;
dev->pm.stats.last_doze_event = jiffies;
- if (!mt76_is_usb(&dev->mt76)) {
- dev->pm.enable_user = true;
- dev->pm.enable = true;
- dev->pm.ds_enable_user = true;
- dev->pm.ds_enable = true;
- }
if (!mt76_is_mmio(&dev->mt76))
hw->extra_tx_headroom += MT_SDIO_TXD_SIZE + MT_SDIO_HDR_SIZE;
@@ -240,6 +234,8 @@ int mt7921_register_device(struct mt792x_dev *dev)
if (ret)
return ret;
+ hw->wiphy->flags &= ~WIPHY_FLAG_PS_ON_BY_DEFAULT;
+
hw->wiphy->reg_notifier = mt7921_regd_notifier;
dev->mphy.sband_2g.sband.ht_cap.cap |=
IEEE80211_HT_CAP_LDPC_CODING |
Well I’m not sure where that patch actually came from, maybe it was already on a mailing list. It’s just news to me, so I was wanting to see what Mediatek kernel developers have had to say about it.
There is an obvious power consumption trade off with a patch like that, so I think it’s very important to quantify how much worse it makes things in some predictable workloads (like idle, transferring content, etc) to decide if it’s a generally good idea.
If it’s just from a distro or user somewhere, then I think it needs to be raised still.
I suggest using ./scripts/get_maintainer.pl to find the right people to discuss it with and then sending it up for their feedback.
Maybe there is a happy medium to change the power save policy to not be so aggressive for this card?
When i google for this problem i can find different sites with equal patches. They are not exactly the same as @jwp posted but they look like the same thing. Site1 Site2
When i run ./scripts/get_maintainer.pl -f drivers/net/wireless/mediatek/mt76/mt7921/init.c i get a few people responsible and linux-wireless@vger.kernel.org as an open mailing list described with MEDIATEK MT76 WIRELESS LAN DRIVER.
I read in the mailing list the following statement:
So although it’s not pretty to look at, bad ping times to the AP aren’t representative of the full user experience.
That’s true but it’s not only the ping which is bad.
Without the patch i can’t ssh into my laptop because the high round trip time makes ssh unusable slow. Sometimes ssh freezes completely and no communication is possible.
Can you respond to the mailing list post with your observations? It’s not as severe for me at home so you may be able to help make the case for upstream changes.
@RW1 If you’re too busy to mail the mailing list, I might be able to do some tests, this weekend and could try and mail, I will link the thread here, so people can look, when it could be fixed.
@Mario_Limonciello Any idea, if this fix will be backported ? I’m not sure if you’re familiar, under which conditions, a fix gets backported to older kernel release?