[TRACKING] High Battery Drain During Suspend

@dma well it doesn’t seem likely to help considering what they said but for lack of better option or official response I’ll give it a try yes. I just installed 3.08 and I’ll run some sleep sessions and report if I see any change.

1 Like

FYI 3.08 fixes drain while shut down, not during suspend.

I just found this news about a patch from Intel related to energy consumption in my RRS today

I have not been able to follow up on all the details, but I’m wondering if this might be related?

1 Like

@feesh yes that’s why I said it doesn’t seem likely but thanks for adding this. And as expected, 3.08 did not help with this problem, I reproduce easily after just 1 unplug.

@dfh nice finding, it could be yes. We’ll have to wait and track in which kernel version this patch lands I suppose.

Yeah sadly I don’t have the time and setup to quickly test the patch. I hope someone can make sense of the change and if it impacts FrameWorks.
And hopefully be able to report back :+1:

1 Like

Small update after being in a situation where I’m actually relying on the battery quite a bit.

The nvme.noacpi=1 option indeed makes a huge difference.

With Fedora 35, Linux 5.17.4-200.fc35.x86_64

I’m now seeing:
s2idle with HDMI and USB-A inserted: 1W
s2idle with just USB-C cards inserted: 0.34W

“idle” use (reading something on the screen, with rather low screen brightness): about 4W
“screen locked” use (screen off): around 2W

So I’d say the nvme.noacpi=1 has almost completely resolved the power drain: even with 1W, the machine can stay suspended for more than 2 days and without the HDMI and USB-A cards it’s an entirely decent score.

A consequence is that the extra power use of these cards really sticks out as a sore thumb. It would be great if some kind of switch would be available to turn off the drain on these, even if it would require replugging them to make them functional again.

At the very least I think Framework should advertise that these expansion card can affect the power profile, even if they’re not used. It would have made me buy two extra USB-C cards so that I can place the machine in a power-frugal setup without having gaping holes in the bottom …

I have also seen the kernel in such a state that it was still using about 4W while suspended. I suspect that was due to a crashed R8153 ethernet adapter driver (that ethernet chip seems to be particularly mercurial with USB-C setups in linux; hopefully the workarounds improve a bit in the next couple of kernels). With that power use the laptop feels warm to the touch after spending some time in a laptop sleeve/bag. After rebooting this has not reoccurred, but it does provide a good motivation for monitoring the drain a bit.

2 Likes

That’s what I had to do as well. Using 4 usb-c cards, and can’t remember when I last swapped them out…because now, the USB-A and HDMI cards are just acting as dongles in my use cases…when/if I need to use them.

This makes the ‘swappable’ use case relatively more niche than before.

@Nils thanks for your update which confirms most of what I saw. I agree the impact on battery life should be at least specified on adapters.

What makes you point the R8153 ethernet adapter? did you find any way to poinpoint the reason for the incorrect suspended state? Also if that’s the same cause for me I don’t need to reboot, simply waking up the laptop and putting it back to sleep usually fixes the problem. Every time I put it to sleep I have to check the temperature 30 minutes after to be sure it’s actually sleeping correctly… Not very cool.

Only circumstance made me blame it. I did not do an analysis for the suspend states hit by the various components and I don’t have a real lead that a defunct R8153 driver was causing the problem. The R8153 did fail before with a

r8152 … …: Tx status -71

message and an “Oops” traceback afterwards. The Belkin Multimedia USB-C hub it is part of would not show the network interface afterwards anymore: not after uplugging and replugging and also not after suspending and waking (and various combinations of the two).

After that the laptop would get warm when suspended and put in a bag and showed ~4W power use (no peripherals other than the usual expansion cards inserted); consistently.

Both indicate that the kernel is in a bad state. So I made a leap and assumed the two are correlated.

Rebooting fixes the R8153 problems (until the next crash when in use) and fixes the bad suspend as well.

Oddly enough, I have a USB-C monitor that also has an R8153 network interface built-in and that one causes no problems. I’ve used the Belkin hub successfully with other (non-unix) devices, so it’s either the network that makes the hub act up or it’s the way it’s wired in the hub that is difficult for the linux driver (unlike most R8153 problem reports, the error occurs when the network interface is in use, so likely not due to power saving issues)

Ok thanks, but when in a bag I suppose the Belkin hub is not plugged-in ? It was plugged before though I suppose. As some other people are seeing the same high power sleep bug I suppose it’s unlikely to be related to this external interface. But who knows :slight_smile:

For comparison’s sake, what is the best-in-class power consumption using s3 and s2idle with any linux laptop that exists? The latest from Lenovo seems terrible too. So perhaps we need to pressure AMD/Intel to work on this with the kernel devs - but they must have already noticed, right? Considering it seems the best hardware sleeping with Linux today isn’t anywhere close to the power saving available on over decade old macbooks.

2 Likes

As I suppose we’re not gonna be able to motivate anybody from the team to really look into this (not enough people complaining), I am asking myself one thing: if my framework is burning hot for hours in my bag because of this bug and it causes permanent damages, will the waranty work? will I be able to get a free repair/replacement?

3 Likes

Same worrying from my side: every time I close my laptop and then I end up opening it on the next day, battery is completely depleted, this can’t be good for the battery.
Battery on my laptop last about 3 hours for me currently, so I guess mine is already quite damaged, although it never performed much better than 5 hours to me I think.

I wish Framework team would acknowledge this issue, and at least release an official note/recommendations on what to do, or how to properly go about this in the mean while.

I miss the days I was able to go out without the laptop charger. Now I know that if my framework was left unplugged, probably has no more battery by the time I want to use it.
Very frustrating, to say the least.

2 Likes

Same feeling here.

I’m worried when I suspend my laptop, put it in my backpack and 1h later find it almost burning my hands as if it wasn’t really suspended. Then I was really frustrated when I was forgetting to plug the laptop overnight.

I say “was” because for the first time in 20 years using Linux, I’ve turned on hybrid sleep on a laptop. When unplugged, I suspend the laptop for 2 hours and then it hibernates.

It takes more time to resume since it has to boot but at least I’m not on edge anymore.

2 Likes

I think that if it gets that hot, it isn’t properly suspended. And indeed, given that Lithium-ion batteries are apparently quite temperature-sensitive, also not good for the longevity of the product. At those power levels, it should be able to ventilate well.

There seem to be frustratingly many things that can lead to suspend state not be properly entered, or lead to high power use. I’ve seen on this forum:

  • nvme ssd drives use excessive power during suspend (nvme.noacpi=1 seems to help in those cases)
  • higher power use due to a previously crashed kernel module
  • tricky setups with wake-up triggers, making the system wake up almost immediately from suspend (and hence not being suspended when placed in the bag)

I haven’t seen suspend problems in my setup lately, but some unexpected update could unfortunately change that. The current power usage means that on a full battery, the system can easily survive for 2 days suspended; going up to about a week with USB-A and HDMI removed. It’s not great, but workable if otherwise your laptop has a habit of being plugged in overnight.

To some extent, the power problems seem to stem from “modern suspend” s2idle, which makes for very responsive systems, but in turn for rather high power usage and apparently, because the system is hardly asleep at all, a state that is very sensitive to other factors to change its power use. I don’t think Framework can be quite blamed for that and given the possible causes for increased power use, it looks difficult to give widely applicable advice/tips.

They should warn people that expansion cards that are not just USB-C cause significant power use even during suspend, though!

4 Likes

Hello, I was playing around with turbostat and the s0ix debugging tools provided on 01.org and noticed some failures when testing s2idle.

[  880.618452] PM: Suspending system (s2idle)
[  880.618455] printk: Suspending console(s) (use no_console_suspend to debug)
[  880.619538] wlp170s0: deauthenticating from 94:83:c4:1f:4d:62 by local choice (Reason: 3=DEAUTH_LEAVING)
[  881.200712] PM: suspend of devices complete after 581.211 msecs
[  881.200717] PM: start suspend of devices complete after 582.169 msecs
[  881.200720] PM: suspend devices took 0.582 seconds
[  881.215569] PM: late suspend of devices complete after 14.843 msecs
[  881.241747] ACPI: EC: interrupt blocked
[  881.308356] PM: noirq suspend of devices complete after 92.029 msecs
[  881.308393] ACPI: \_SB_.PR00: LPI: Device not power manageable
[  881.308398] ACPI: \_SB_.PR01: LPI: Device not power manageable
[  881.308400] ACPI: \_SB_.PR02: LPI: Device not power manageable
[  881.308402] ACPI: \_SB_.PR03: LPI: Device not power manageable
[  881.308403] ACPI: \_SB_.PR04: LPI: Device not power manageable
[  881.308405] ACPI: \_SB_.PR05: LPI: Device not power manageable
[  881.308406] ACPI: \_SB_.PR06: LPI: Device not power manageable
[  881.308407] ACPI: \_SB_.PR07: LPI: Device not power manageable
[  881.308413] ACPI: \_SB_.PC00.RP10.PXSX: LPI: Device not power manageable
[  881.308415] ACPI: \_SB_.PC00.HECI: LPI: Device not power manageable
[  881.308417] ACPI: \_SB_.PC00.PEG0.PEGP: LPI: Constraint not met; min power state:D3hot current power state:D0
[  881.308422] ACPI: \_SB_.PC00.GNA0: LPI: Device not power manageable
[  881.310108] PM: suspend-to-idle
[  893.693629] Timekeeping suspended for 11.610 seconds
[  893.693860] ACPI: PM: Wakeup unrelated to ACPI SCI
[  893.693863] PM: resume from suspend-to-idle
[  893.696003] ACPI: EC: interrupt unblocked
[  894.122699] PM: noirq resume of devices complete after 426.935 msecs
[  894.126365] PM: early resume of devices complete after 3.551 msecs
[  894.642587] PM: resume of devices complete after 516.134 msecs
[  894.651955] PM: resume devices took 0.525 seconds
[  894.651973] PM: Finishing wakeup.
[  894.651975] OOM killer enabled.
[  894.651976] Restarting tasks ... 

I am curious if these \_SB_.PR0x power management failures mean anything to anyone?
I am guessing these are platform features that are not suspending when the machine is put to sleep causing the suspend power drain.

1 Like

This is kind of a joke. 1260p, hybrid sleep disabled, 2x USB C, 2x USB A

Why am I wasting hours of my life trying to diagnose this on such an expensive product? How is this acceptable on an enthusiast oriented machine?

4 Likes

Yup…I’ve been there with the frustration. All very good questions.

All I can say is: Some people here on the forum confused “enthusiast oriented machine” with “wanting to tinker”. In my book, the two are not one and the same.

I’m doing a new run of benchmarks of the suspend time power usage here, using batterylog and my homegrown solution. Hopefully I should be able to corroborate or infirm other results people might have had here. So far I have found those results to be interesting:

Also, reading this thread is kind of painful for two reasons. One, it’s kind of sad how much trouble everyone is having with their battery life. I get it, trust me, I’m in the same boat. But second, it’s kind of a shame it goes a little off the rails…

I don’t think it’s productive to beat the Framework Team over the head with this kind of stuff. They’re working on those issues, maybe not fast enough for your taste, but keep in mind this is a young startup, I don’t think it’s fair to expect them to produce a laptop that has the batter life of a Carbon X1…

Anyways, next up is results, I’m hoping to get something more solid some time this week. My assumptions are that, as usual, the expansion cards are the main culprit, that deep sleep saves a lot of power, and that nvme.noacpi=1 won’t have an effect on my laptop. Apparently, that setting might be specific to s2idle!

See also:

Update: tests are not complete just yet, but first results are trickling in. So far I am confirming the results that the USB-A modules take almost half a watt on standby (500mW for the first, 370-380mW for the n+1), which is much worse than what they do on idle.

It seems like something kicks those cards into high gear on suspend, it’s quite bizarre. I’m going to run more tests to confirm this, with more USB-A cards. I’ll also run the tests with all the other cards I can lay my hands on, but it really seems like the main culprit right now in my fluctuations in suspend battery performance

I’m actually considering creating a whole new thread specifically about expansion card battery use during suspend, since this one is becoming quite long…

4 Likes

I’m a little confused as to how this works.

First off, here, -m freeze doesn’t enter a deep sleep state, at least not as power-saving as s2idle or deep anyways, so it’s normal you’re seeing bad results from this.

But more importantly, I don’t actually understand how or even if turbostat can give you power usage results at all. Normally, the CPU is basically stopped on deep suspend, you only deliver power to the memory to keep that alive, and then resume processing tasks in the CPU once the computer resumes from sleep.

How can turbostat tell anything that happened then? It’s basically suspended… or is it using some other tricks there?

Also, has anyone here actually tried to report this explicitly to the @Framework team?

What I’m seeing right now is major power drain issues with expansion cards, specifically on suspend. It seems like they enter a different power profile than when the computer is idle. So it could be something relatively simple to fix, since the difference really is between suspend and idle…

Anyways, once I get my actual results, I’ll open a new thread about this and cross-reference it here. I’m also considering just opening a formal support request to see if we can get this moving in the team as well.