[TRACKING] Battery flipping between charging and discharging / Draws from battery even on AC

“This release modifies the battery charge limit functionality to add a 5% float range. This allows us to reduce the number of microcycles on the battery when the CPU turbos.”

Will the rest of the FL13 lineup see this fix coming down the line?

Ideally, it would be, IMO, better if the float range can be user-defined. (The “Start Charging Threshold”…mentioned / requested in May 2022)

ac = 1
chg_voltage = 17600mV
chg_current = 2740mA
chg_input_current = 4500mA
batt_state_of_charge = 44%

No battery temperature display, and the chg_voltage is incorrect, should be 17800mV, that’s not flipping but still related to EC and it also has negative impact on the battery.

Neither do thermalget

sensor  warn  high  halt   fan_off fan_max   name
  0      343   353    393    313     343     local_f75303@4d
  1      343   353    393    319     327     cpu_f75303@4d
  2      343   353    393    401     401     ddr_f75303@4d
  3      381   388    400    376     378     cpu@4c
(all temps in degrees Kelvin)

What am I missing, does the EC code make it show the temperature?

I took a look at the common/charge_state_v2.c code in the Framework EC firmware Github defining the charge_request function.

Most of the logic is over my head, but I think the charge_control calls set the buck/boost voltage level to some small offset above whatever the EC thinks the current battery voltage is. If the CPU spikes cause a momentary battery voltage sag, that would explain why the charger voltage shown by ec-syslog gets lower with each function call.

Here is what I think is the relevant section of code.

#ifdef CONFIG_CHARGER_NARROW_VDC
		current = 0;
		/*
		 * With NVDC charger, keep VSYS voltage higher than battery,
		 * otherwise the BGATE FET body diode would conduct and
		 * discharge the battery.
		 */
		voltage = charger_closest_voltage(
			curr.batt.voltage + charger_get_info()->voltage_step);
		/* If the battery is full, request the max voltage. */
		if (is_full)
			voltage = battery_get_info()->voltage_max;
		/* And handle dead battery case */
		voltage = MAX(voltage, battery_get_info()->voltage_normal);
#else
		voltage = current = 0;
#endif

I also found this page with mainboard schematics for different FW13 laptops. Looking at the schematic for the AMD 7040 series laptops specifically, it appears they use a model ISL9241 buck/boost converter. Apologies if the existence of these schematics were already common knowledge, they were new to me.

1 Like

Well, that is what the EC firmware source code says, but the question is, is that the way it is supposed to work if it was doing NVDC correctly, according to the standards and appropriate datasheets. How much should one really trust that code snippet is doing it correctly.
As it turns out, that code snippet is doing it wrongly or it has got a bug in it.
“curr.batt.voltage” is the current battery voltage, as measured by the Battery BMS.
The ISL9241 datasheets say that should use the charger VBAT measured value.
While those two values might be similar, I think it is fair to assume that the charger’s own ADCs that measure VSYS and VBAT will be calibrated so that a comparison of the two makes sense, rather than using the value from the battery BMS, that has unknown calibration accuracy.
Now, what I have just said here, if fixed by a one-line fix, might actually fix this problem for everyone, as it is an obvious bug in the EC firmware currently.
As I have said before, I don’t really think the EC firmware code is particularly high quality, with yet another example of a bug here.

According to git blame, that line of code was written/last changed in 2015-10-28.
So, its fair to say, that is probably before FW existed!!!

I am not sure the command and the logic is even correct.
One of the application notes for the ISL9241 says otherwise:
https://www.renesas.com/en/document/apn/r16an0002-isl9241-operation-modes-application-note?r=507701
In the section:
2.2 Transitioning from Bypass to NVDC or NVDC + CHRG
“… The ISL9241 output and system are ramped down close to the battery voltage so that
charging can be enabled safely any time…”
But. In step 3: “Set MaxsysV = VADP (from ADC or PD controller) - 2V. For > 20V adapter, set to max of 18.304V”. So this starts the MaxSysVoltage close to the adapter voltage so bypass can be switched off, and it then is ramped down in step 5.
For step 5 it has “Set MaxSysVoltage = Battery full charge voltage after 1ms”
So, “battery voltage” here means “Battery full charge voltage”, not the actual current battery voltage or VBAT right now.
So, these instructions basically start the MaxsysV at 20V - 2 = 18V.
These instructions then suggest it ramps down to what the battery full charge voltage is (read from the battery) is, which is about 17.6V.
This is not the same as the code above that is ramping it down the the current charge level + 8mV, which is commonly about 15.8V.

There is nothing in the ISL9241 datasheets that suggest NVDC mode should have MaxsysV set to current_batteryV + 8mV.

So, one could say that the datasheets are also ambiguous in this area.

FYI, all the ISL9241 datasheets are here:

I would be interested to hear what other people think should be happening (ignoring the EC Firmware source code), and just basing it on the datasheets.

4 Likes

Here is an example to demonstrate why the “Battery flipping” is happening, and how to FIX it !!!
The EC Firmware has set SysMaxV: 15888 mV (What the Charger Buck/Boost tries to reach)
The PSU is reaching VSys: 15648 mV, (after being Boosted up from 8352 mV).
The Charger is seeing VBAT: 15744 mV (not the Battery BMS of: Present voltage: 15886 mV)

So the PSU VSys output is below the Battery, so it starts discharging the battery a little:
Present current -10 mA

I think that is wrong behavior. If SysMaxV was set to battery Desired voltage: 17800 mV (as per the ISL9241 datasheets).
It would be safe, not too much for the battery, and it would bring the PSU VSys voltage up to something above the VBAT. (Be boosted a bit more)
If the CPU asked for more power than the PSU can do, the VSys would start dropping until the VBAT helps out. (It can only be boosted up if the PSU can supply the power)

A separate line of investigation around PROCHOT is also shown, as all the status below has the various charger PROCHOT off.
WOCP is also controlled to be off. This is the 20V, 12A protection feature (that I mentioned in previous messages) that might have affected the 240W PSUs. So the WOCP protection is disabled by the EC firmware. So not contributing to a 240W PSU problem.

Details below:
PD: 8545.0 mV, I: 1982.00 mA, 16.94 watts
ADP: 8352.0 mV, I: 1824.00 mA, 15.23 watts
BAT: 15744 mV, discharge: 0 mA, 0.00 W, charge 0 mA, 0.00 W
SYS: 15648.0 mV, I: Unknown, Max 15888 mV
Tj-C: 43.4 C
control0: 0x0000
control1: 0x0303
control4: 0x0304
NGATE control: On
BYPASS control: Off
BGATE control: Normal / Off
BGATE control: Normal / On
Reverse Turbo Boost control: Off
WOCP Function control: Off
information1: 0xC049
Diode mode status: On
Reverse Turbo mode status: Off
Bypass gate power good status: Off
NGATE power good status: On
Trickle Charging mode status: Off
Comparitor status: CSIN < CSOP
BGATE power good status: On
Low_Vsys_PROCHOT# Status: Off
DC PROCHOT# Status: Off
AC PROCHOT# Status: Off
Active Control Loop: Input adapter current limit loop is active
Internal Reference Status: On
information2: 0x4C35
PROG readout: 0x15
Power Stage Operation Mode: 1:Forward Boost mode
State Machine: C:VSYS. The charger is in the forward mode and switching, the system voltage or adapter current (or input voltage) can be regulated in this state.
Battery Status: Present
General Purpose Comparitor Status: Low
PSU / AC Adapter Connected Status: Present

Interfaces:0xffffffff
comm_init_dev being used /dev/cros_ec
Battery 0 info:
OEM name: NVT
Model number: FRANDBAT01
Chemistry : LION
Serial number: 0084
Design capacity: 5491 mAh
Last full charge: 5766 mAh
Design output voltage 15480 mV
Cycle count 82
Present voltage 15886 mV
Present current -10 mA
Remaining capacity 3463 mAh
Desired voltage 17800 mV
Desired current 5491 mA
Flags 0x0b AC_PRESENT BATT_PRESENT CHARGING

2 Likes

Have you already tried this? Should be relatively save as the battery bms can allways disconnect itself if it doesn’t like what’s going on.

Not sure how much I like an ocp feature being completely disabled instead of just being set to a reasonable level. Given how trigger happy framework seems to be with ocp in other places (usb out) that is a bit of a surprise.

I have not tested changing the MaxSysV. The EC source code is rather complex so I don’t yet know how to do it safely without any side affects. That part of the EC source code is unnecessarily complex in my view. In my view, it should take the higher level state, NORMAL, IDLE, DISCHARGE and use a different rule set for each, eventually reaching a sensible set of values to control the charger chip.
It should have two states.

  1. Current state.
  2. Intended state.
    If (1) != (2) take gradual, appropriate steps to get closer to (2) and just follow the transition steps described in the ISL datasheets.

The WOCP limit levels are not configurable. Its a fixed 12A for the Adapter (PSU) limit (its at 20V by that point).

They cut that one really close then XD.

Would it make sense to just make it (temporarily) runtime changeable to test if it works in general before trying to implement it cleanly?

Well, there are some very important edge cases to consider.

  1. It needs to act differently if the battery is ill.
  2. If the battery is at or near 0%, it has a special “limp” mode, where it starts with a very small charge current, to try and coax the battery back to life. Lithium Ion batteries don’t like being less than 20% charged, and positively get upset below 4%.

I don’t want anything I do, to upset these edge cases, and any other edge cases hiding in the code.

The fact that the previous engineers that have edited the EC firmware code have added comments like “it is too complex to try X…” does not bode well towards a fix.
Its much like the “Here be dragons!!!” comments scattered over the Linux Kernel source code. Its a general polite warning to never touch that section of code, unless you have considerable expert knowledge of all the edge case any changes might affect.

I think the next thing I am going to look into is why PROCHOT_CPU and PROCHOT_GPU seems to be set, but then hardly ever cleared when you return to idle.

The point wasn’t to make it changeable as a final solution but to test if changing the voltage would help before trying to make a nice solution that does cover the edge cases.

Though most of those edge cases should be handled using a current limit not fiddling with the voltage (of course the current limit fiddles with the voltage but that’s the charge controllers problem). You can set the current limit for battery charging right not just the voltage?

This may be the cause of those “cpu stuck in limp mode” posts we get here sometimes XD.

Are there two buck boost one between DC bus and PD the other between DC bus and battery? Or just one

Isn’t PROCHOT means processor hot? There’s no such a thing that a Low system voltage causing the processor to become hot. A lowered voltage actually reduces the energy loss of VRM as the voltage difference is smaller. or the EC could just be unprofessionally using the PROCHOT flag arbitrarily

The name is kind of misleading, it’s more of a general “AAAAAAAA somethings wrong got to limp mode” kind of signal and can get triggered by all kinds of stuff. On old thinkpads plugging in a too weak power supply would also set prochot for example.

It does mean something is not good and throttling may prevent it from crashing.

It also may make the thing not work XD

It’s more of a the name is misleading bit.

in my case those flags are on whenever I plug or unplug the AC adapter ( I’m using Framework 180W power adapter ) after I’ve booted my system.

Unless I cold/warm reboot or suspend/resume, they stay on, maybe that will work on FW13?

I’m on a Framework 16 with 7840HS, no dGPU, BIOS 3.05

Back when I used to use and AMD 3500U, I noticed that whenever I plugged/unplugged the AC adapter, the processor frequency would drop to 400MHz~ for a short time and then change back to normal.

think like MEMORY_RESERVED term instead of BLACKLIST_MEMORY, not everyone needs to blacklist memory for reservation reasons, but makes more sense to mark it as reserved and then later explain why it’s reserved

from here is a guess on my part

It’s possible that AMD or whoever didn’t exactly think ( of the terms ) of this at the time and Framework probably just thought of using PROCHOT_CPU and PROCHOT_GPU instead of something like LET_POWER_STABILIZE to get the same effect for when changing power state between AC and DC.

I’m sure AMD did think of this and Framework somehow missed this?

I think they ( Framework ) simply forgot to add some timer ( or something like this ) to turn those flags off after a short period of time and because of the safety/protection behaviour of PROCHOT, this possibly leads to other issues, things going out of sync, turn off, freezes due to lack of power or things not being done at right timings

again, I’m guessing

This thread makes me think Framework could benefit from using TLA+

I have investigated the PROCHOT, see this thread:

Conclusion, the EC does not have any bearing or control over PROCHOT_CPU and PROCHOT_GPU.

The EC can do EDC_CPU which makes everything really really slow.

As an aside, I have also been looking into FTR and FTH.
With the fixes I have made to the EC code, I have not experienced one FTR or FTH since.
My fixes lead to a much more stable VSYS and the battery never discharges when I don’t expect it to and no flipping happens.
I just hope FW use my fixes to benefit everyone else.

3 Likes

Did you see that Framework seems to have updated the EC firmware source code? I asked support for a link to the 3.09 EC firmware source code and they gave me this: GitHub - FrameworkComputer/EmbeddedController at fwk-lotus-azalea-19573

My fixes, are on top of that github version.

1 Like

I see… so, maybe these issues could be resolved for everybody in in 3.10 then :sweat_smile: Do you plan to open a PR to get these fixes upstreamed?

Though looking at how many PRs are open on the repo I am not entirely confident that they’d be merged unless you bug Framework in a support ticket or something :frowning: