FW13 HX 370 & 2x48GB RAM - Thermal throttled by RAM

i had to downgrade from 96gb to 64gb, i did remove the RF shielding and put thermal pads in their place to sandwich the ram with it. that method on the 96gb kit still didnt work cause they got so hot that it made the cpu skin temperature sensor throttle the cpu until it got to 40c again, and the laptop got so hot during gaming that i could barely touch it. with the 64gb kit it gets warm but manageable and no throttling. its annoying that framework decided to pair these things with powerful igpus but no way of cooling the ram under load, the ram essentially gets no airflow.

1 Like


heres a picture of what i did as an example, the ram transfers heat to the keyboard and the mainboard. since theyre 32gb sodimms each they dont make the laptop stupidly hot like the 48gb ones do. you wanna put thinner thermal pads under the ram and thicker ones on top so that the keyboard puts pressure on the ram, i used 0.5~1mm on the bottom and 1.5~mm on top, i dont remember really, just play around with the sizes enough to where it does its job without showing deformity on the keyboard when you screw everything back in.

2 Likes

@Grandpa_engineer what brand of RAM modules did you use when you had 2x 48GB and what do you use now?
I just received my FW13 and have 2x 48 GB Crucial modules and also suffer from thermal throttling. I wonder if it is worth trying the thermal pads with these modules or if I should return them right away for 2x 32 GB ones.

i went from the crucial 5600 cl46 96gb kit to the teamgroup 5600 cl46 64gb kit, yea just return them cause even if you do my method of sandwiching them with thermal pads, theyre gonna get so hot to the point where it causes the apu to throttle.

1 Like

this whole thing is weird, no big tech youtube channels are talking about this issue with ddr5 sodimms when used with igpus, nor have i heard of other laptops with the same apu having this issue (or maybe i didnt look far enough?). does anyone know if the framework 16 has this issue too when using its igpu?

Yeah, this is the normal CPU throtteling, not RAM throtteling. It is caused, because the cooling solution cannot remove all the hot air from the CPU during sustained load. You can use tools like ryzenadj to increase or decrease the skin temp limit, to get more performance or a cooler laptop. Use a laptop stand to improve air flow.

its both, when cooling the ram using the thermal pad method the 96gb kit gets so hot that it makes the cpu skin temperature reach 50c which triggers the cpu throttle. During a graphics load the 96gb kit reaches over 80c and then cuts bandwidth to the igpu which you can see on hwinfo (this is if you dont have thermal pads on the ram), so you either run the ram without thermal pads and let it throttle or you use thermal pads on the ram and make the cpu throttle, not to mention it causes the laptop itself to become untouchable cause its so hot. the 64gb kit runs cooler so it doesnt cause the cpu to throttle but you still need to put thermal pads on the ram so it keeps giving high bandwidth.

What is the throtteling factor in this case?

It’s better to thermal pad RAMs and ignore CPU skin temperature in ryzenadj, than not using thermal pads and let the RAM throttle

Wow, thank you @Crazyblox for uncovering the issue. I’ve observed inconsistent performance when running LLMs locally, and this explains it. I’ve got 64+64G Crucial, 5200.

Easy way to repro is to run those 3 in parallel:

  • stress-ng --memrate 24
  • amdgpu_top
  • watch sensors

image

When this temp gets to 79-80C memory bandwidths decrease from read/write 32G/32G to 20G/20G and possibly lower (and bws generally jumps around quite a bit, sometimes it’s 42G/22G, next measurement is 32/32, ..).

I read somewhere that RF flaps are to reduce EMI, and FW had to add them to pass FCC testing. It may work without them just fine, you’ll have more EM noise though.

Re skin limit temp – with 7040 it was possible to set with ryzenadj, I had to set it higher to get more than 30W out of the cpu. For 300 AI there is no such knob.

I’ve ordered some thermal pads, and if I have time to tinker I’ll mill a 1 mm copper plate to spread the heat to a wider area.

There are also perforated vent covers near DDR modules closer to the display, removing them may increase air flow to ram when the keyboard cover is installed. Looks like the fan have some clearance to keyboard cover, so if it pulls air also from top, it must come from somewhere, and there are no other wents at the top.

Hey folks, I found the perfect solution:

DDR temp stays at 75 even with plain 2mm silicone (non-thermal) pads I had on hand, and it doesn’t throttle. Closing the lid does require some extra force, though.

2 Likes

guess i shouldve bought a framework 16 with the gpu expansion so it wouldnt have to use the ram as vram, but its too late now…

i hope the ddr6 lpcamm2 ram run cooler, or at least framework should have a dedicated heat pipe for it (either add a heatpipe that works along side the one for the soc for the fan on the left or make it use the unused vent on the right with a 2nd fan which could also be used to cool the soc similar to the cooling in the fw16). whenever it comes out.

How much is your skin temp when this happens?

I’m not sure which one is the skin temp. In the output of sensors possible candidates are edge, local_f75303@4d, acpitz-acpi-0 temp1/2/3/4. ryzenadj doesn’t list skin temp for AI 300 (and does for the 7040).

Here’s full output of sensors right now (kind of idling):

amdgpu-pci-c100
Adapter: PCI adapter
vddgfx:        0.00 V  
vddnb:         0.00 V  
edge:         +74.0°C  
PPT:          24.09 W  (avg =  24.09 W)
sclk:         601 MHz 

cros_ec-isa-000c
Adapter: ISA adapter
fan1:            3548 RPM
local_f75303@4d:  +63.9°C  
cpu_f75303@4d:    +73.8°C  
ddr_f75303@4d:    +58.9°C  
cpu@4c:           +95.8°C  

ucsi_source_psy_USBC000:003-isa-0000
Adapter: ISA adapter
in0:           3.20 V  (min =  +3.20 V, max =  +3.20 V)
curr1:         2.25 A  (max =  +0.00 A)

spd5118-i2c-3-50
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +64.0°C  (low  =  +0.0°C, high = +55.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +85.0°C)

ucsi_source_psy_USBC000:001-isa-0000
Adapter: ISA adapter
in0:          20.00 V  (min =  +5.00 V, max = +20.00 V)
curr1:         4.60 A  (max =  +4.60 A)

BAT1-acpi-0
Adapter: ACPI interface
in0:          17.48 V  
curr1:         0.00 A  

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +63.8°C  
temp2:        +73.8°C  
temp3:        +58.8°C  
temp4:        +95.8°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +96.0°C  

mt7921_phy0-pci-c000
Adapter: PCI adapter
temp1:        +62.0°C  

ucsi_source_psy_USBC000:004-isa-0000
Adapter: ISA adapter
in0:          20.00 V  (min =  +5.00 V, max = +20.00 V)
curr1:         3.00 A  (max =  +3.00 A)

spd5118-i2c-3-51
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +65.8°C  (low  =  +0.0°C, high = +55.0°C)  ALARM (HIGH)
                       (crit low =  +0.0°C, crit = +85.0°C)

ucsi_source_psy_USBC000:002-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

nvme-pci-bf00
Adapter: PCI adapter
Composite:    +54.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +54.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +63.9°C  (low  = -273.1°C, high = +65261.8°C)

acpi_fan-acpi-0
Adapter: ACPI interface
fan1:        3548 RPM
power1:           N/A  

Any ideas which one could it be?

I did record the output of amdgpu_top and (selected) sensors for a 2x 32GB Kingston Fury and 2x 48GB Crucial Kit. See the attached videos. The 32GB modules reach critical temps too without thermal pads, it just takes quite a bit longer to get to critical temperature. Interestingly with the 32GB modules at least the memory bandwidth does not seem to get throttled. During recording I had a single run with the 2x 48 GB kit where the bandwidth didn’t seem to get limited either. Not sure what to make of that.

The recording for the throttling with the 2x 48GB kit:

ram_crucial_throttle

The recording for the 2x 32GB kit reaching critical temps and not throttling memory bandwidth:

ram_kingston_no_throttle

Try ryzenadj --dump-table

Offsets for skin temp:

apu_skin_temp_limit 0x0058
apu_skin_temp_value 0x005C

Thanks! The temp stays around 60-65C with limit of 100C:

| 0x0058 | 0x42C80000 |   100.000 |
| 0x005C | 0x4287452F |    67.635 |

It doesn’t seem to affect throttling:

Code:

from dataclasses import dataclass
import subprocess
import struct
import time
from pathlib import Path

gpu_metrics_path = Path('/sys/bus/pci/devices/0000:c1:00.0/gpu_metrics')


def get_skin_temp():
    out = subprocess.check_output(['ryzenadj', '--dump-table'], stderr=subprocess.DEVNULL)
    for line in out.decode('utf8').splitlines():
        if line.startswith('| 0x005C |'):
            return float(line.split('|')[3])

def get_hwmon_temp(idx, fname):
    return float(Path(f'/sys/class/hwmon/hwmon{idx}/{fname}').read_text()) / 1000


@dataclass
class Buf:
    data: bytes
    ptr: int = 0

    def _read(self, n) -> bytes:
        x = self.data[self.ptr: self.ptr + n]
        self.ptr += n
        return x

    def unpack(self, fmt):
        size = struct.calcsize(fmt)
        return struct.unpack(fmt, self._read(size))


prev_throttles = None
t0 = time.monotonic()
outfile = open('metrics.txt', 'w')

while 1:
    buf = Buf(gpu_metrics_path.read_bytes())

    # see drivers/gpu/drm/amd/include/kgd_pp_interface.h
    sz, fmt_rev, content_rev = buf.unpack('<HBB')
    assert (fmt_rev, content_rev) == (3, 0)

    t_gfx, t_soc = buf.unpack('<HH')
    t_cores = buf.unpack('<16H')
    t_skin, = buf.unpack('<H')  # it's always 52.82C
    buf.unpack('<26H')
    dram_reads, dram_writes = buf.unpack('<HH')

    buf.unpack('<HHQIH4I16HHHHHH')
    buf.unpack('<8H')  # clocks
    buf.unpack('<16H')  # core clocks
    maxfreq, gfx_maxfreq = buf.unpack('<2H')  # core clocks
    # print(maxfreq, gfx_maxfreq)

    throttles = buf.unpack('<7I')
    diff_throttles = [(p - n) for p, n in zip(throttles, prev_throttles or throttles)]
    prev_throttles = throttles
    # print(diff_throttles)

    real_t_skin = get_skin_temp()

    t_ddr0 = get_hwmon_temp(10, 'temp1_input')
    t_ddr1 = get_hwmon_temp(11, 'temp1_input')

    print(time.monotonic() - t0, t_gfx/100, t_soc/100, real_t_skin, t_ddr0, t_ddr1, dram_reads, dram_writes, file=outfile)
    outfile.flush()

    time.sleep(1)


# gnuplot:
"""
set terminal x11 noraise
set y2tics
set ylabel "temps, C"
set y2label "bw, GB/s"
plot "metrics.txt" using 1:4 with lines title "t skin", "metrics.txt" using 1:5 with lines title "t ddr0", "metrics.txt" using 1:6 with lines title "t ddr1", "metrics.txt" using 1:(($7+$8)/1000) with lines title "bw" axis x1y2
"""

1 Like

Another interesting observation:

  • if I touch chip in the middle of the dimm, reported temp immediately drops, but bandwith throttling is not affected,
  • if I touch ddr chips, then reported temp drop is not that large, but mem immediately unthrottles.

Meaning that temp measurements through i2c aren’t used for throttling, and probably mem chips “decide” on throttling themselves, maybe in a tight loop with ddr controller.

1 Like

Wow, this is very high! Did you set it to 100C? Is this without thermal pad?