Merging this as multiple threads for a related issue simply makes our job that much more difficult to track.
I know! I was the one who spent days tracking this down and reading the EC firmware code.
I’ll give it a try because I want to see this fixed, but I really feel it’s your responsibility right now to spend time on it. I spent days tracking this down, reading EC firmware code, etc.
Support is ignoring my story and asking questions like “do you see this on other OS” but I clearly stated in the thread that people have reported this on all kinds of operating systems. I gave the same reply last time, but they keep ignoring what I write. Last time they asked me “Does the issue only happen when plugged in or in battery mode?” when even the title of my report already says that this is only when AC is connected. It’s hardly possible to have a meaningful conversation with your colleagues if they don’t read what I write in the ticket.
edit: Interesting to see a similar experience here: [RESPONDED] Fedora 38 - Kernel 6.4 - Lenovo Thunderbolt 3 Dock Black Screen - #16 by Christopher_Bates
I think they should have kept separated. As I wrote in the comment above, the issues are probably a bit related, but still separate.
Something else, here’s a bash version of my reproduction script by the way… The version above uses the fish shell.
while true
do
date ; sudo ectool fwchargelimit 30 ; sudo ectool console | grep "Battery\|Charge Limit" ; sleep 60 ; date ; sudo ectool fwchargelimit 99 ; sudo ectool console | grep "Battery\|Charge Limit" ; sleep 10
done
As stated previously, please work with support on this.
Also do not insinuate support is ignoring you. Do as they ask, work the process.
Honestly @Matt_Hartley , it does feel like they are ignoring the data, or not properly noting / escalating. In my last request, they asked for all the same files I originally sent to the initial support contact.
I also explained to the support rep that there is nothing in the data, and according to what @real_or_random found, they won’t see anything in journal dumps as it is happening at a lower level.
I am still willing to capture this if I get time this weekend, but it requires that I disable the service that @real_or_random created to address this and basically break my laptop again, so I cannot do it during the work week.
If you feel like it’s going no where, and you have worked through the steps requested, ask for it to be escalated. That is reasonable if the requested steps and processes have been met.
I am looking at your ticket reply from the 17th (yesterday). You’ve indicated a workaround and looks like Support is waiting for you to provide the requested information. Generally speaking once this is done, it will be escalated from there unless they have direct insights/suggestions.
One of the biggies is the logs. Once you’ve submitted that, I suspect it will be escalated that same day.
Per support:
“With the logs, kindly send it to us as we still need the results for documentation purposes.”
Everyone reading this, allow me to reset expectations.
-
Support uses a process. It may not be ideal in your eyes, but there is method to it. Please follow it for the best results.
-
Support is going to slowed down as we are slammed right now. No one is being ignored, but there is a hefty queue they’re working through.
-
Posting here does not change anything above. Most of the time, issues we see folks facing are due to third party devices involved, missed steps or in some cases, a needed RMA after all possible other causes have been exhausted. Standard stuff.
-
While the process feels long and painful, no one is being ignored, dismissed or otherwise treated as if there issue experienced isn’t valid.
That said. I am not looking to discuss this further as I have a multitude of other customers needing attention. They, like you, need my assistance.
And with that, all tickets will absolutely get followed up on and escalated as described above when the escalation requirements are met. Thank you for your cooperation.
If you would like to share workarounds, solutions and ideas, please do so - we encourage this. If you are looking to vent frustration or echo existing tickets, I will be closing this thread down.
I do not want to do that as there is valuable insights shared here, but this is getting long in the tooth.
Posting this here. It’s a quick shell script I put together based on what was sent from support. If someone is able to reproduce this sooner than I am, and has a case already feel free to use it to grab it.
It will generate a tar.gz containing the requested data to “${HOME}/Downloads/” by default, but it can be modified by changing the path defined in the DIRECTORY variable.
Nothing crazy, but figured I would put it out there to try to get this thing moving faster.
Very considerate, thank you. Support will provide a one liner that works in a similar fashion.
Of course, now that I am trying to force this to happen, I cannot get it to trigger. I disabled the service provided by @real_or_random , and I think I reverted the EC settings looking through it, but I cannot get this to happen and it’s been three days.
If there is anything you can think of that I missed in the list below I would be happy to keep driving this with support if I can break it again.
Changes
-
Manually stopped charge-limiter.service
systemctl stop charge-limiter.service
-
Disabled charge-limiter.service
-
Reverted chargecurrentlimit to “NO_LIMIT” (4294967295) by hand (tested with fw-ectool as well)
ectool chargecurrentlimit 4294967295
-
Tested recreation script (fw-ectool changed from original as I have both ectool and fw-ectool on my system now)
#!/bin/bash while true; do date ; fw-ectool fwchargelimit 30 ; fw-ectool console | grep "Battery\|Charge Limit" ; sleep 60 ; date ; fw-ectool fwchargelimit 99 ; fw-ectool console | grep "Battery\|Charge Limit"; sleep 10 done
-
Have run in state with service disabled for 3 days (2 of normal workload, 1 of light workload)
Do you still have a battery limit <100% set in the BIOS? Have you tried the script with a battery level between 31% and 99%? This is required for the fwchargelimit
commands to be effective. (Only then fwchargelimit 30
stops charging and fwchargelimit 99
starts charging.)
Otherwise you may actually be lucky. I can still reproduce it reliably with the script.
Yea, I used the script, and I still have it set to 80% charge in the BIOS. I agree that I may be lucky (though it feels the opposite now that I am trying to capture it), I also ran for months before I encountered this before. I will go ahead and leave it in this state, and see if I can get it to come back.
Had to smile because I can relate (for me, it was something with my car) I digress.
Do keep us posted.
I finally captured it, and got an exact timestamp of the disconnect. Log bundle uploaded to support.
Same problem here, with an 11gen and 13gen intel.
Original charger, original cable.
After months of this problem, I decided to keep track of the charging power with node_expoter and prometheus.
Turns out, even when fully charged, the battery will sometimes report charging of a couple mW (from 20ish to 200ish). This doesn’t seem to have a hard correlation with system usage.
But another very interesting thing I noticed after playing around with the official ubuntu battery life guide: I turned off boosting of the CPU completely with TLP. And the “battery chargin / discharging” desktop notifications are now completely gone. So the main culprit of the flipping seems to be the very short bust of required power during clock boosts.
Here’s a quick graph showing the charge value (calculated with node_power_supply_current_ampere * node_power_supply_voltage_volt
from node_expoter.
There’s a tiny bump in load when the battery ist charging for a bit, but it’s not a lot
I now spend an afternoon playing around with different settings in TLP and the result is…
confusing?
No matter what I configure now, I can’t get the power flip-flop to trigger. Even with boosting enabled. (including GPU boosting).
The “charging” power keeps at absolutely 0 in my monitoring, and the “fully charged” status doesn’t budge even for a second. And before it was happily doing it multiple times a minute, at times.
The only thing I can see is, that the maxinum CPU frequencies are not reported as high as before.
Example:
You can see that node_exporter
detects the max frequency at 4.6GHz for the P-cores. Which is the maximum spec for my 1340P. (the short down-spikes you can see are moments where I disconnected power to move the laptop).
I thought those were the actual frequencies the core will boost to, so node exporter reports those.
If I had the laptop disconnected, you can see the reported max to be lower (as boost was disabled on battery)
But now I can’t get node_exporter to detect the 4.6GHz, no matter what I do. And I suspect that is connected with the fact that the connection is not flip-flopping anymore.
Here’s a screenshot of my current TLP CPU settings
So, uh, I have no idea what is going on. I will keep an eye out and see if it returns after a couple reboots, maybe.
For completeness sake, I’m running Xubuntu Jammy on an Intel 1340P
Additionally, it seems my Ugreen 45W charger also works like a charm now.
I tested it under full load (using all cores with blender rendering)
Note: I have a system tray widget that shows me the power drain/charge from the battery by using/sys/class/power_supply/BAT1/current_now
and /sys/class/power_supply/BAT1/voltage_now
- started off as “fully charged”
- after 10ish seconds, my widget showed me a battery drain of 0.5W
- A couple seconds later it increased to 2W
- Only after another couple seconds (maybe half a minute after starting the render), my system reported the battery discharging
- When the render was done, the system reported normal charging of the battery
- After it was full again, the system now is still happy with the 45W charger. I now even attached my phone to it, so the charging is halfed. Still no problems
Hm, I guess these settings in theory shouldn’t trigger what you see, i.e., that the CPU is limited even on AC. But I’d recommend disabling all the options, and starting from scratch.
Hm, I just took a look at Optimizing Ubuntu Battery Life, and I think the recommendations are a bit strange. In general, I’d recommend keeping most of the defaults. TLP has pretty good defaults, and many ticked checkboxes in the guide are not necessary because they correspond to the default. And where they differ, what the guide recommends can be questionable. For example, CPU_SCALING_GOVERNOR_ON_AC=performance
seems to be a strange recommendation. Okay, it doesn’t matter that much on AC, but performance
just disables all energy CPU optimizations, even those that make sense on AC and don’t hurt performance noticeably. Also, this setting just means that CPU_ERNGERY_PERF_POLICY_ON_AC
will be ignored anyway, so it’s rather confusing to set both. See intel_pstate CPU Performance Scaling Driver — The Linux Kernel documentation for details.
The recommendations may seem odd, but the idea is a user is either connected to power or not, thus disabling everything in terms of noticeable performance or not noticeable performance. It’s merely a suggestion based on what worked best for us.
Do try the TLP defaults to see if you end up with any difference in flipping.
@real_or_random agreed, that most settings don’t make much sense.
After playing around some more, I can definitely say that the boosting is causing this.
But that includes the GPU boosting!
For 2 days I now only have following settings set via TLP (and kept BAT and AC settings the same):
-
CPU_BOOST
set to off -
CPU_HWP_DYN_BOOST
to off - GPU frequencies set to the max that is output via
tlp-stat -g
, but set the boost frequency to the regular max frequency, so it doesn’t boost up.
Important: it seems that TLPUI doesn’t always load the new settings when saving, so a quick tlp start
is often needed to actually apply the changes (I suspect this behavior screwed with some of the tests I did before)
This works on a i7-1260P
as well as i5-1340P
framework
Do keep sharing what you find as I lack the cycles right now to dedicate specific time to replicate this. If you find something hyper-specific we can improve in docs, awesome share how you arrived there and with your specific hardware and TLP config.