AI 300 heatsink on 7x40u?

Some sources say its a rebrand, others insist its not. In either case, I trust Thermal Grizzly to deliver something decent. It performs well and at just 10€ I don’t really care about a few bucks.

Others will likely have the same trust questions and will find this thread, so its great if we review Honeywell “alternatives”.

1 Like

I can bump 41W using the original heat sink with PTM7950

So also in the ballpark of your numbers.

Now I am curious how much the new heatsink can do (or if it power capps at what temperature) at max fan and maybe also at min fan speed.

When I had to decide between getting the AI 370 and the 7840U it was essentially between the higher TDP (and hopefully less throttling) and the better GPU in the 7840. I decided for the latter and after replacing the stock paste with PTM7950 thermals were ok, but things got a little bit too toasty after I found out about ryzenadj, especially for the VRM.

Found this thread by chance shortly after and took the gamble on a 50€ heatsink with 12€ shipping, ouch. My “brother” @Benn was a little quicker with his test, but I figured it can’t hurt to have two independent ones, so I did mine a few days ago.


Test setup:

  • framework 13 7840U, 2x16GB RAM 5600MHz
  • angled on stand for better airflow, no external fan
  • Linux Mint 22.1, Cinnamon 6.4.8, kernel 6.14.0, bios 3.09
  • cpu testing: s-tui 1.1.6 + stress 1.0.7
  • gpu testing: amdgpu_top 0.10.5 + furmark 2.9.0.0
  • tweaking with ryzenadj 0.16.0 (last release is broken and non-sbu version)
  • default: sudo ryzenadj --slow-limit=30000 --apu-skin-temp=60
  • battery at 90%, power supply plugged in, set to performance
  • all tests at steady state (system warmed up at least 10min at high load)
  • all tests on temperature rise (pause preheat for 15s before starting test - this give more consistent results)
  • ambient temperature about 30°C. I did the tests after one another, there might be some ambient temp drift
  • after applying new PTM7950, the system was thermal cycled a few times to allow the material to spread

A few general observations first:

  • most tools require sudo for full information; s-tui needs latest kernal to show all sensors
  • focus on CPU testing, GPU testing incomplete
  • that said, I could not hit max clock rate on the GPU, even at 43W no CPU load
  • microstutters in 3D applications are related to X11, perfectly smooth on wayland
  • RAM gets to over 85°C when using GPU. I added thermal pads between the sticks and the bare board, now its about 65°C

Test 0: 7840U stock paste vs. PTM7950

Foolishly I did the first test in normal mode and only have a couple notes. In normal mode the peak power limit is 50W peak for 3s, 35W short term (30s) after which it quickly falls of to 28W sustained.

mode power all core temp
stock balanced 27.7W 3.2GHz 85°C
ptm7950 balanced 28W 3.37GHz 78°C
ptm7950 performance 30W 3.44GHz 80°C

So just switching out the stop paste for cheap aliexpress PTM7950 improves temps by 7°C at 28W! Crazy how better temps give you a 5% clock boost as well, just because the chip runs a little more efficient at lower temps. The fan sounded the same, but I did not have a fan rpm measurement at that time.


Thermals:

This nice thermal image was taken after running 30W sustained. You can clearly see that the two VRM chips next to the CPU are the hottest, the upper one running at 78.8°C surface temp. Its core temperature will be a bit higher, but its also rated at 125°C max, so no problem.

When talking about cooling solutions, temperatures are not the best metric, because they vary with power throughput. The better value is thermal resistance, it how much temperature increase per watt of power throughput. On top of the CPU we measure about 56°C, so the delta to the on die sensor is 24°C and the thermal resistance Rth is 0.8°C/W on this first section.

On the cooling fins we measure an average of about 54°C and 55°C, the difference is 2°C and 1°C respectively. Because the pipes are identical, we can assume the Rth is identical and therefore the increased difference means a higher power flow on the inner pipe. This amkes intuitive sense, because the inner pipe gets cooled with outside air, while the outer pipe gets only the prewarmed air. Assuming the 2:1 ratio this means the pipes have an effective Rth of 0.1°C/W.

Finally at an ambient temperature of 30°C the cooler fins have a thermal resistance to the ambient air of 24°C/30W = 0.8W at the given airflow. The nice thing about the Rth values is that you can also add them up and use the sum for end-to-end thermal calculations:

Rth_sum = 0.8 + 0.1 + 0.8 = 1.7°C/W
T_diff  = Rth_sum * Power = 1.7°C/W * 30W = 51°C
T_CPU   = 30°C (ambient) + 51°C (T_diff)  = 81°C  (1°C rounding error)

What I wanted to highlight with this exercise is that the heat pipe is already almost perfect and cant be improved much. What we can du however is reduce the thermal resistance into or out of the pipe.


Test 1: 7840U 43W sustained reference

Before swapping the the AI 300 cooler I wanted a reference number and internal temps. As stated before, I did all tests with s-tui. Run it with sudo to get all the sensors. Your view might vary slightly because the SSD and WiFi card have their own sensors. You can clearly see that all graphs are steady state, and the left side provides a detailed numeric readout.

Yes, I was able to hit 43W with PTM7950. But no, its not a good idea, because it is cooking the VRM hard with 98.8°C. With some delta to its internal temperature, closed lid and circulating heat form internal components (NVME, RAM) the margin is too small for my liking. The surrounding parts won’t like the high temperatures either and may age at an accelerated pace.

I would suggest aiming for 85-90°C VRM temperature for continuous operation, which is reached at about 35W CPU power. Unless… we cool the VRM too.


Test 2: AI 300 stock paste

Eager to see the improvement I swapped coolers and I started with 43W. The temperature hit the ceiling immediately and I hastily dialed back the power. Not a great start, but I could be doing something wrong. Anyway, I took test results at 30W and 35W.

Thermal cam images (30W) aren’t that promising either. The keyboard heated up to 45°C in the seams and 43°C on the keycap, compared to the normal temperature of 35-37°C. The reason for this is that the fancy VRM cooler touches the underside of the keyboard, or at the very least in is close proximity. The VRM is doing better then before, but only by 3-4°C. I also noticed a hotspot on the contact plate next to the heat pipe and measured that at 72°C.

Lets quickly calculate the thermal resistances to the other cooler. Its not super scientific, but at least should give an idea what contributes most to the worse result:

Rth_cpu = 25.3°C/35W = 0.722°C/W
Rth_pipe = 0.8°C/35W = 0.022°C/W
Rth_air = 35.7°C/35W = 1.02°C/W

This new cooler design has a much worse heat transfer from the pipe to the air. I suspect the issue is the strong gradient of 10°C across the heat pipe to fin contact area. I wonder if this can be mitigated by changing the fin geometry with longer or denser fins to guide the airflow better. A similar approach was taken on the 5090 flow-through cooler.


Test 3: AI 300 PTM7950

Maybe the stock paste is not the same phase change material? I took of the heatsink, cleaned of the paste and noticed contamination on the copper. This was a very thin but hard residue, which I could not wipe off. To give it the best change, I lapped the cooler to a mirror shine.

Yet, even with a polished contact area and fresh PTM7950 results look the same within the margin of error.


Test 4: AI 300 no VRM

So what if the VRM adds to much heat into the system so that it can’t cool don’t the CPU effectively? Easy test, just remove the pads and reinstall. The issue persists, but at least this proves that cooling the VRM doesn’t make things worse.


Test 5: 7840 with VRM

Now that it is proven VRM cooling does not harm performance, I figured I can attach an extra copper fin to the heat pipe. I did have some 1mm copper sheet, I cut of a 7mm wide strip and carefully bend it into a flat Z shape. I isolated it with kapton tape except the thermal contact area. Right now it is attached to the heat pipe with shitty thermal adhesive tape, but I plan on soldering it later for reliability. It contacts with VRM with 2x 1.5mm pads stacked. And as the thermal image proves, this worked!


Performance of the AI 300 heat sink

Its a bummer that the new heatsink performs worse then the old one. Now I don’t have a AI 300 board to test the 7040 cooler on, which would be a reasonable follow up experiment. I did notice that the 7840U die does not sit perfectly center under the heat pipe, but still the entire chip is covered and in comparison to the other cooler it does not have the “gap” between the pipes.

If this is expected behavior, then I really wonder why framework didn’t stick with the old design. Is one wide pipe cheaper then two small ones? Does it look more impressive and like a sell-able upgrade? Or is it just as unexpected for you as it is for me? In which case I how you are happy with my hardware bug report :slight_smile:

Oh and one more thing now that I have the attention: The VRMs do have a hardware temperature output signal which is digitized by the power stage controller. Can you please add the reading as a sensor to the EC controller? It would help a lot. Alternatively you could also convince MPS to hand out a datasheet and register map so we can attempt our own I2C driver for that.

4 Likes

Those are smart power stages, they’ll protect themselves before going into thermal runaway. Still though more cooling would not hurt.

Maybe covering the top of the heatsink in some tape could slowdown the thermal transfer to the keyboard a bit.

It should be.

Is the vrm wing still touching the underside of the keyboard with the vrm pads removed? My initial thought there was that a too thick thermal pad may be pushing up the wing (which also may make the contact to the soc worse).

I kind of doubt it, I currently thing either you got a dud or there is something interfering with the coldplate making proper contact. Maybe they used some components that are taller or in a slightly different spot on the 7x40u board that prevents making proper contact.

I am amusing you checked if the fan rpms were roughly the same.

While I don’t quite understand what exactly @Benn tested it looks like his numbers were slightly better and not worse.

Are you sure it’s one with actual temperature output? I know they exist but there are also ones with digital over-temperature outputs that just have one or a few thermal setpoints through internal compactors so while the powerstage may know if it’s too hot or not it may not have a way to tell how hot exactly.

That would be nice.

Thank you for all your testing, looks like we get mixed results here. Maybe a faulty cooler causes the worse performance?

I’m not worried about thermal runaway, I’m worried about accelerated aging.

Its difficult to measure the gap, but if there is one, its minimal and air is already a pretty good insulator. Its a design tradeoff, and I don’t think it can be fixed.

The pads are very squishy, and the heatsink doesn’t touch anything but the CPU die. The mounting pressure is determined by the springs on the mount, which have identical measurements and at least similar force. PTM “squish” looked identical regardless of the cooler.

As I wrote, I even polished the coldplate to ensure contact is perfect. Because the Rth to the heatsink is lower in the AI 300 cooler, I don’t think the issue is located here. Otherwise I would have assumed the slight misalignment (die to heatpipe is offset by 1.5mm, but still fully covered) or void in the solder connection between heatpipe and coldplate. Maybe there is too little liquid in the pipe, or not the correct pressure? But then the performance in the middle area should be much worse, which it isn’t.

He tested with a lower fan speed, which may influence the performance of the fin stack. So maybe the new cooler performs better at lower RPM.

The primary ones are MPS86941, the secondary smaller ones are MPS86901. All other parts form that family with a public datasheet have an analog temperature output. The main power stage controller is an MP2845B, which has an analog input to digitize these signals and supports I2C. I traced the connection from the temperature output to the controller, and it is connected. But honestly, I don’t want to risk measuring it at runtime for a few internet points.

I can’t think of any defect that causes these results.

Fair point.

unless the plate deforms slightly downwards (which doesn’t sound too impossible given it kontaine the keyboard which may experience some downwards pressure sometimes) then it’s a very bad insulator.

Well that pretty much rules that theory out then.

Which may be the dud part.

That would cause bad transfer from the soc to the heatpipe not form the heatpipe to the air. Better contact between heatpipe and soc would lead to hotter heatpipe (less delta T between soc and heatpipe) all else equal.

That would be a win on it’s own but it seem to still outperform your higher rpm on the whole temperature at power level field. In that test report there are like hour long stretches of of 35ish W at way less than “celling” temperatures.

Connected to the ec or the cpu itself?

I might if I ever care enough to try.

bad soldering between the heatpipe and the finstack for one could at least partially explain your results. Looking at the thermal images the temperature in the heatpipe seem to be realtively constant across it so it does seem to be heatpiping at least somewhat. If it was deffective or had too little fluid or something you would see the thermal gradient go to the roof once you exceed it (may also depend on the orientation like with the vapour chambers of some early 7900xtx reference models)

If you can tell me which pins on which chips are connected to each other, I might be able to craft some EC code to read the temps from the VRM. I.e. Pin X on the MP connected to Pin Y on the EC.

lower Rth => lower temperature delta at a the same power => better performance

Exactly! And that is what I am measuring too. Compared to the old cooler, the new heatpipe over the CPU is hotter despite a lower wattage thoughtput.

Connection as follows: VRM => VRM controller => EC => CPU

I’ve found presentation from MPS presention their AMD/Intel power reference design. Page 11 and 12 are of particular interest: https://media.monolithicpower.com/mps_cms_document/m/p/mps_solutions_for_powering_intel_and_amd_socs_22.04.2020.pdf

Or watch the presentation on youtube: https://www.youtube.com/watch?v=BT8GyduiHzA

EDIT: Do you have the skill to send and read arbitrary data over I2C? Physically probing the bus and sniff the data should be pretty save, I might be able to do that, but I couldn’t do anything with the data.

Regarding the heatsinks differences in performance: I know for a fact that when it came to my 11th gen, the performance can differ from heatsink to heatsink. It’s a spectrum / spread of quality…of the joint between the contact plate to the heat pipe, and the joint between the heat pipe to the fin stack.

I had two heatsinks; one came with the laptop, the other was an extra order. The first heatsink wasn’t cooling the processor as well as I had hoped, so I placed an order for a replacement. Both were polished on a 16000grit whetstone (Shapton Glass Stone 16000 Grit – Knifewear - Handcrafted Japanese Kitchen Knives). The replacement was doing better by 8c lower at peak sustained load.

They have mentioned the new heatsink design here first: Introducing the Framework Laptop 13 powered by AMD Ryzen AI 300 Series

It sounds like the new design will be used in the future and it was revamped for better performance (quiet and cool). I am sure they have tested it before and found it to be better. That is why it is curious that we see a worse performance than before. I would suspect it to be at least the same or even better.

Alongside that performance, we’ve kept the system quiet and cool with a revamped thermal system. This now leverages one large 10mm heatpipe and Honeywell’s awesome PTM7958 phase change thermal interface material.

Well the entire system is better, but they are comparing old cooler + traditional paste vs. new cooler + PTM. The large performance increase thanks to the TPM may offset the slightly worse performance of the heatsink.

And if you guys are right and my heatsink is below average, then either quality control is shit or it’s spec’ed very loosely.

Looking at the whole fw16 cooler malarkey I’d say that isn’t entirely outside of the realm of possibilities.

@freernd

Then we would see consistent worse results, but right now, we have both, very positive and very negative results.

Maybe the product manager @Destroya can tell us more :wink:

Wish I could help more! AI 300 series heatsinks aren’t officially compatible with 7040 series mainboards. They might physically fit, but we don’t consider them compatible and haven’t run any tests to see whether performance would improve or worsen if you tried one. This feels more like a “tinkering” project, something we can’t officially comment on.

That being said, as a big fan of Frankenworks, I personally love seeing what creative builds y’all come up with!

At this point this is less about it fitting on the 7x40u platform and more about there being possible QA issues with them.

You may want to pull out some random ai300 samples and do some thermal testing to see if they also have such variance in cooling performance.

Having done commercial board design before I know it takes extra effort to keep new revisions compatible with old ones. but I do also understand that the cross-validation can quickly spiral out of hand and I wouldn’t want to do that either :slight_smile:

I don’t want you to guarantee us compatibility, but please take a second look at the thermal solution and verify that the new HW performs consistently as expected in the combination you sell. I don’t have a Ai 300 board, so I cant verify it.

That would depend on the spread of the quality control / acceptance distribution.

e.g. Something like this:

For example, if the original 7x40u heatsink wasn’t particularly great, then the likelihood of a 300 heatsink being better, relatively, is higher. Compare to, say, if the 7x40u heatsink was pretty great (in the top 5% percentile), then you may have a lower chance of getting a 7x40u that performs better.

We don’t know if the performance distribution between the two heatsink productions have / doesn’t have any performance overlap.

i.e. Quality of the new heatsink is not the only factor. “Better” or “worse” is a relative comparision of before and after (two points). Need to take the “before” into consideration.

If you have an AI 300 Series mainboard (and a heatsink) and you are experiencing thermal issues, please report them via our support team, they should be able to give you specific tests/tools for triaging and report issues internally to the engineering team.